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Abstract 

We present an iterative technique for finding zeroes of vector fields on Rie- 
mannian manifolds. As a special case we obtain a "nonlinear averaging algo- 
rithm" that computes the centroid of a mass distribution /x supported in a set 
of small enough diameter D in a Riemannian manifold M. We estimate the 
convergence rate of our general algorithm and the more special Riemannian av- 
eraging algorithm. The algorithm is also used to provide a constructive proof 
of Karcher's theorem on the existence and local uniqueness of the center of 
mass, under a somewhat stronger requirement than Karcher's on D. Another 
corollary of our results is a proof of convergence, for a fairly large open set of 
initial conditions, of the "GPA algorithm" used in statistics to average points 
in a shape-space, and a quantitative explanation of why the GPA algorithm 
converges rapidly in practice; see [TT] . 

We also show that a mass distribution in M with support Q has a unique 
center of mass in a (suitably defined) convex hull of Q. 

2000 AMS Subject ClassiGcation: Primary 53B21, 60D05; secondary 53C99 
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1 Introduction. 



In this article we present an iterative technique for finding zeroes of vector fields on 
Riemannian manifolds, and apply this technique to the averaging of a mass distri- 
bution with support contained in a sufficiently small ball in a Riemannian manifold. 
Our approach provides a new and constructive proof of Karcher's theorem on the ex- 
istence and uniqueness of the center of mass, under a somewhat stronger requirement 
on the radius of the supporting ball than was used in 

This study was originally motivated by curiosity about a method (the "GPA algo- 
rithm") used in statistics to find the average, suitably defined, of a sample of shapes. 
In many areas of image analysis, particularly in biological applications such as cardio- 
graphy (cf. [2Z1) and maps of the brain (cf. this average is the starting point for 
understanding "normal" shapes and deviations from the norm. In practical applica- 
tions the averaging algorithm tends to converge remarkably quickly, often stabilizing 
to desired precision after two or three iterations (cf. P, Figure 5 (p. 22), or |H] 
Table 3 (p. 307)). The initial purpose of our study was to understand the geom- 
etry underlying this algorithm and, in quantitative terms, why the convergence in 
practical applications is so rapid. In exploring this the author found that the GPA 
algorithm has a more general interpretation on Riemannian manifolds, generalizing 
to a technique for finding local zeroes of a vector field. The technique is an itera- 
tive algorithm that we show is closely related to Newton's method and mimics the 
contracting-mapping proof of the Inverse Function Theorem. 

As a special case of this technique, we obtain a general Riemannian averaging 
algorithm. The vector field used in this algorithm has a unique local zero, assuming 
the diameter D of the support of distribution being averaged is not too large, and is 
"almost linear" near this zero if D is small, explaining the rapid convergence. This 
zero is exactly the Riemannian center of mass of the distribution being averaged. 
In sections 0] and El of this paper we quantify "not too large" and "small" , giving 
sufficient conditions for convergence of the algorithm and estimating the convergence 
rate. 

The Riemannian averaging algorithm can in principle be applied to any "nonlin- 
ear averaging" problem in which the objects being averaged are parametrized by a 
Riemannian manifold, and is easily implemented in spaces for which the exponential 
map and its inverse are explicitly known (e.g. Riemannian submersions from spheres, 
and certain homogeneous spaces with invariant metrics). This is exactly the situa- 
tion for the shape- averaging problem. The (Euclidean) shape space is the space 
of configurations of k non-identical labeled points in R", modulo equivalence under 
translations, rotations, and dilations (rescalings) in R"; sometimes one also allows re- 
fiections. The size- and- shape space is defined similarly, but one does not mod out 
by rescalings. These spaces can naturally be given the structure of manifolds with 
singularities, with natural Riemannian metrics on their smooth parts (f7| |2l ITS]). 
Averaging (sizes-and-) shapes can be viewed as averaging certain mass distributions 
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on (size-and-) shape spaces, namely finite lists of points with normalized counting 
measure. In the probability and statistics literature there is a commonly accepted 
definition of mean size- and- shape, the Procrustean mean size- and- shape, but several 
possible definitions of mean shape (see \21\ P- 292 and [22]), the most common of 
which may be the Procrustean mean shape used in [22] ■ However, while the Pro- 
crustean mean size-and-shape as defined in the probability and statistics literature 
agrees with the Riemannian center of mass, the Procrustean mean shape does not. 

The GPA (Generalized Procrustes Analysis) algorithm as described in lives 
intrinsically on size-and-shape space; call this algorithm GPA-SS. To obtain from 
this an algorithm that averages shapes, one first embeds shape-space into size- 
and-shape space in a standard way, carrying the list Q of shapes to be averaged to 
a list t{Q) of sizes-and-shapes. One then produces a sequence of in by applying the 
GPA-SS algorithm to l{Q). Finally one projects the limit (if there is one) back onto 
shape space. Call this set of steps GPA-S. Le proves in j23] that if the shapes in Q are 
not too far apart in E^, and if the sequence in converges, then the limit in is 
the Procrustean mean size-and-shape of the list i{Q). It is not hard to show that this 
projection of the Procrustean mean size-and-shape is exactly the Procrustean mean 
shape ([235 P- 54), so that GPA-S computes the Procrustean mean shape. 

Although the literature contains many discussions of the GPA-SS and other GPA- 
derived algorithms, at the time this paper was first completed ^01 the literature con- 
tained no theorems giving sufficient conditions for any of these algorithms to converge. 
However, as we show in [TJ , the GPA-SS algorithm is exactly our Riemannian averag- 
ing algorithm as applied to size-and-shape space. Hence convergence of the GPA-SS 
algorithm, for an explicitly describable open set of initial conditions, is an immediate 
corollary of the Riemannian-averaging theorems in sections H] and [21 of this paper. 
After PU] was written, p5], which contains some overlapping results, appeared. 

In the iterative part of the GPA-S algorithm, one can obtain a sequence of points 
in shape space by projecting each point in the GPA-SS sequence, rather than just the 
limit, back onto shape space. (This sequence in shape space can also be described 
slightly more intrinsically; see [H], where we discuss the application of the results 
of this paper to Procrustean averaging in more detail.) In this way one obtains an 
iterative algorithm GPA-S' on shape space itself. GPA-S' does not coincide with the 
Riemannian averaging algorithm on shape space — it cannot, since it converges (for 
suitable initial conditions) to the Procrustean mean shape and not to the Riemannian 
average. However, GPA-S' is an algorithm of the more general type also considered 
here, and therefore its convergence, again for an explicitly describable open set of 
initial conditions, follows directly from our more general theorems in section |2l as 
well as from the fact that GPA-S converges. 

In this paper we also address the question of why the convergence of the GPA 
algorithms is so rapid in practice. As has been noted by many authors, the data 
sets averaged in practical applications tend to be very concentrated sets in shape (or 
size-and-shape) space; their diameter D is very small compared with any length-scale 
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derivable from the geometry of shape (or size-and-shape) space. Our theorems in 
section |31 show why, for small convergence is rapid. 

To describe our results more concretely, we need some notation and terminology: 

Definition 1.1 Let (A, rf^), ^b) be metric spaces and let k G [0,1). We call a 
map F : A ^ B a contraction with constant n if dB{F{x), F{y)) < k dA{x,y) for all 
x,y E A. 

The results of this paper are proved using a version of the Contracting Mapping 
Theorem f Theorem 12. 1|) . The maps we use arise from certain vector fields, perhaps 
defined only locally, on Riemannian manifolds. To describe these maps, let V be the 
Levi-Civita connection on a Riemannian manifold {M,g), not assumed complete. If 
X is a vector field defined on some open set V C M, then at each point p E V 
we can view the covariant derivative VX as a linear transformation TpM TpM, 
namely v t— > V^X. Call X nondegenerate on a subset U G V if this endomorphism 
(VX)p is invertible for all p E U . When referring to bounds on (VX)p ^ and other 
linear transformations, throughout this paper we use the operator norm: ||T|| = 
sup||^ll=i 

A vector field X defined on an open set in M and nondegenerate on a subset 
U defines a map $x '■ U M hy 

<l>x(p) = expp(-(VX);%), (1.1) 

assuming that expp(— (VX)~^Xp) is defined for all p E U. (In this paper we use both 
Xp and X{p) to denote the value of a vector field X at at a point p.) Note that 
zeroes of X are fixed-points of $x, and if ||X|| is not too large pointwise then the 
converse is true as well. One of the results of this paper is the following theorem, a 
much stronger version of which is proven in §2. 

Theorem 1.2 Let {M,g) be a Riemannian manifold and let U G M be open. Given 
e > 0, ki > 0, k2 > 0, let ^eMMi^) denote the set of nondegenerate vector fields X on 
U satisfying the following conditions pointwise on U: (i) ||X|| < e, (ii) IKVX)"-*^!! < 
fcf^, and (Hi) ||VVX|| < ^2. // both e/cf^ and k2k^^ are sufficiently small, and 
X E X^^k-^^k^iU) , then $x : U M is a contraction, where the distance function on 
U is the one determined by the Riemannian metric g on M. If U is a ball B of radius 
p centered at po, and if p is sufficiently small and e,ki,k2 are as above, then there 
exists a positive ei < e such that if ||X(po)|| < ^i, then $x preserves B and hence has 
a unique fixed point p in B; the point p is also the unique zero of X in B. For all p 
in some possibly smaller open ball centered at po, the iterates ($x)"'(p) converge to p. 

Example 1.3 Euclidean space R". Since T^R" = R" canonically for all x E R", 
a vector field X on R" can be naturally identified with a vector-valued function 
G : R" — s> R", and the Levi-Civita connection is just given by ordinary directional 
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differentiation: (VX)^{v) = {DG\x){v) = ^G{x + tv)\t=o- The exponential map is 
given simply by exp^(t') = x + v. Thus 

<!>x{x)=x-{DGU)-\Gix)), 

which is exactly the Newton's-method map used in the usual contracting-mapping 
proof of the Inverse Function Theorem; cf. §4.9. 

Example n illustrates the close relationship between the iteration in Theorem 11.21 
and Newton's method. However, one gains considerable flexibility by not requiring 
quite so strict a relationship as in p.lj) . looking more generally at maps of the form 
p I— s> expp(yp) := \l/y(p) for suitable vector fields Y. Our approach will focus on 
maps of this more general form, deducing consequences for maps of the form $x as a 
special case. For the maps \l/y, the size restriction on ||VX|| and || VVX|| is replaced 
by the single condition that at each point the endomorphism VY be close to minus 
the identity. Note that in this case, —{'VY)~^Y is close to Y, so that the maps $y 
and \l/y are themselves close. Iterative schemes based on maps of the form \l/y are 
thus a natural generalization of Newton's method. Our most general result for these 
maps and their associated algorithms is Theorem 12. 8| a stronger version of Theorem 
ll.2l in which all the "sufficiently smalls" are quantified for the maps \l/y and $x- One 
corollary is the following: 

Corollary 1.4 Let 5 < A G R, ri G R, and suppose that the sectional curvature K 
of M satisfies 6 < K < A. There exists a number Dent, depending only on S,A, 
and Ti, such if ^ is a probability distribution supported on a set Q G M of diameter 
less than Dent, and the local convexity radius at some point of Q is at least ri, then 
the primary center of mass q of ^ exists, and the Riemannian averaging algorithm 
converges to q for every initial point q E Q. 

The definition of -Dcrit in terms of 5, A, and ri is given in §4 (see ()4.18|) ): the "primary 
center of mass" is defined in §3. 

We use the exponential map in defining \l/y because of its universality, but in 
specific examples "exp" can be replaced by other maps defined on a neighborhood of 
the zero-section of the tangent bundle. This is convenient in the shape-space setting 
for the algorithm GPA-S'; see jTTj. However, any continuous map F : {U C M) M 
can always be expressed in the form exp oy , with Y continuous, provided that for 
all p G the distance d{p,F{p)) is less than the local injectivity radius at p (see 
Definition 12. 4j) . Thus if we are interested only in maps that have any chance of 
having fixed points, we can always restrict attention to maps of the form \l/y. 

This paper is organized as follows. In §2 we study the maps \l/y and derive con- 
ditions for iterative algorithms based on these maps to converge. Before specializing 
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to the Riemannian averaging algorithm, some discussion of Riemannian centers of 
mass is needed; this is given in §3, where we also define the vector field Y on which 
the averaging algorithm is based. In general a probability distribution on a man- 
ifold (even one supported on a finite set) can have more than one center of mass, 
depending on how "center of mass" is defined, but under certain circumstances one of 
these is distinguished. In statistics this is typically done using least-squared-distances 
minimization. However, we offer a more directly geometric way of singling out a "pri- 
mary" center of mass, using convex hulls. We digress a bit in Section 3 from the 
main contracting-mapping theme because, surprisingly, we have not found any dis- 
cussion of the relation of Riemannian centers of mass to convex hulls anywhere in the 
center-of-mass literature, although the idea seems very natural. Our final statement 
concerning this relationship. Corollary 13.131 may be a fact known to workers in the 
field but it is a stronger explicit statement than we have seen elsewhere. 

In §4 we apply the results of §2 to obtain a constructive proof of the existence 
and uniqueness of the center of mass of a probability distribution /i with suffi- 
ciently support in a ball of sufficiently small radius p (Corollary 14. 7^ . Karcher's 
existence/uniqueness theorem has a less stringent requirement on p, and its unique- 
ness statement has been strengthened by W. S. Kendall JH]- In view of these results, 
the most important feature of the contracting-mapping approach to the center-of-mass 
problem is not that it gives existence and uniqueness of the average, but that it pro- 
vides a constructive algorithm for finding it (Theorem I4.8|l . along with convergence- 
rate estimates. The restriction on p in Theorem 14.81 is almost certainly not sharp. If 
the map on which the algorithm is based has a certain convexity property that we 
call "tethering" , then the upper limit on p can be increased considerably. Tethering 
may occur fairly generally, but the author has no proof of this. Thus the results in 
sections 4-6 are stated both without and with the assumption of tethering. 

In §5 we estimate the convergence rate of algorithms of the form "iterate \E'y" for 
general F , and show that the rate is completely controlled by bounds on W + /. In 
general the convergence of the sequence {pn = ^y(Po)} is exponential; if ||VF + /|| < 
ei then dipn+i^Vn) < d{pi,pQ)e^. For maps of the form $x the convergence is much 
faster, obeying the same bounds that one has for Newton's method in Euclidean space. 
For the Riemannian averaging algorithm we obtain something in between: exponential 
convergence, but with a constant ei that is 0{D^), where D is the diameter of the 
support of the distribution being averaged. We also combine the convergence-rate 
result with W. S. Kendall's uniqueness result to obtain a sharpening of Theorem 14.81 
(Theorem 15. 3p . establishing convergence of the algorithm under a weaker requirement 
on p. 

The statement that ei is 0{D'^) heuristically — and only heuristically — explains 
the rapid convergence of the CPA algorithms; it does not fully explain why CPA 
algorithms converges rapidly in any applications (or determine in advance whether 
they will), since asymptotics do not tell us how small D must be before the leading 
asymptotic term decently approximates the actual convergence rate. However, Theo- 
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rem 15 .31 can be used to give bounds on ei of the form ei < cD^ (for all D less than the 
critical diameter in the theorem, not just for small D), where c is computable from 
the geometry of M. In IJHI we carry this out and give a universal worst-case estimate 
of the convergence rate when the curvature of M is non-negative, which is the case 
in all shape space and size- and- shape space applications. 

In the appendix (3Z|) we prove (or cite proofs of) certain facts used in §i2Hll 
concerning Jacobi fields and the distance function. 

2 Zeroes of Vector Fields 

Throughout this paper, M denotes a smooth connected manifold equipped with a 
Riemannian metric g. The induced distance function on M x M is denoted (ijv/(", 
or simply d{-,-) when no ambiguity can arise. M is always regarded as a metric 
space with this distance function, and the closure of a subset ?7 in M is denoted U . 
Bp{p) C M denotes the open ball of radius p centered at p. If U C M is connected, 
du denotes "distance within U" , the infimum of lengths of curves in U connecting 
two given points of U. TM denotes the tangent bundle of M, and vr : TM M the 
canonical projection. X and Y denote vector fields on M that are at least and 

respectively. If A^^i, N2 are manifolds and F : Ni ^ N2 is a. smooth map, then for 
p E Ni, we let F*p : TpNi — > Tf(p)N2 denote the derivative of F at p. The identity 
map of any space is denoted /. 

The main theorems of this paper are deduced from the following corollary of the 
standard Contracting Mapping Theorem (cf. [26 Corollary 4.9.2). 

Theorem 2.1 (Contracting Mapping Theorem) Let B = Bp^po) be an open ball 
in a metric space {A,d), with {B,d) complete. Suppose that B C U C A, that 
F : U ^ A is a contraction with constant n, and that d{pQ, F{po)) < {1 — K)p. Then F 
preserves B and has a unique fixed point p. Furthermore p G B and limn-,00 F'"'{q) = p 
for all q E B. 

As in the Euclidean case (Example P), in the general case $x (and more generally 
\&y) turns out to be a contraction on sets on which ||X|| (more generally \\Y\\) is 
sufficiently small. Our proof of this fact relies on the following simple fact. 

Lemma 2.2 Let U, M be connected Riemannian manifolds and let n < 1. If F : 
U ^ M is a map satisfying 

\\F^p\\ < n for all p e M (2.1) 

then F is a contraction with constant k. 

Proof: For any curve 7 in f/ connecting p to q, (12. Ij) implies i{F o 7) < ^£(7), where 
i denotes arclength. | 
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We will prove that $x is a contraction (on suitable sets) by computing its deriva- 
tive and applying Lemma 12. 2[ The map $x is of the form exp oY, where F is a 
vector field on M. Below we express the derivatives of the maps F : M — > TM and 
exp : TM — > M in terms of the horizontal- vertical splitting of T{TM) induced by the 
Levi-Civita connection V. We first review this splitting (see also fH], Appendix B). 

Given a curve 7 in M starting at a point p (i.e. a map 7 from some interval of 
the form (— e, e) to M with 7(0) = p), a lift of 7 starting a.t w E TpM is a curve 7 
with TT o 7 = 7 and 7(0) = w — i.e. a vector field along 7 whose value at p is w. A 
lift 7 is horizontal if this vector field is parallel (V-y'(t)7 = 0). Every curve 7 has a 
unique horizontal lift starting a a given w G T^(o)M, and the vector 7'(0) G T^iTM) 
depends only on 7'(0). Hence the map 7'(0) ^— 7'(0) is well-defined and at each 
w G TM uniquely determines a horizontal lift v G Ty^{TM) of each v G TpM, where 
p = Tr{w). The horizontal subspace of T^iTM) is defined to be the subspace if^ 
consisting of all horizontal lifts to w of vectors in TpM, and 'k^w\h^ '■ TpM 
is an isomorphism. The vertical subspace of T^(TM) is the tangent space to the 
fiber TpM at w. The subspace is canonically isomorphic to TpM (identifying a 
vertical vector ^{w + tv)\t=o with v); we denote the inverse of this isomorphism by 
L : TpM —>■ Vu,{TM). The horizontal and vertical subspaces provide a splitting of 
TuiiTM): for every u G T^iTM), there exist unique vectors a,b E TpM such that 
u = a + L{h) (specifically a = n^w and b = l~^{w — a)); we write a = hor('u) and 
L{b) = veTt{u). 

The derivatives we need will be expressed in terms of Jacobi fields (vector fields 
J along geodesies 7 satisfying the Jacobi equation 

Vy Vy J = Riem(7', 7)7'; (2.2) 

see |1] §1.4 or ^H] Appendix A). Below, for w,a, 6 G TM with the same base-point, 
let J}"ab) denote the Jacobi field J along 7^ with J(0) = a, (Vy^J){0) = b, where 7^ 
is the unique geodesic with initial velocity w (i.e. 7^(t) = exp^(^)(tw)). 
Throughout this paper we will be concerned with maps of the form 

= expoY -.U ^ M, (2.3) 

where F is a vector field on some domain U C M. In ()2.3|) we view F as a map 
U TM and assume that image(y) C domain(exp). The derivative (^I/y)* is given 

by 

{^yUv = Jf,,o)(l) + (expp),^(.((V.r)p)) G r*,(p)^ (2-4) 

where w = Yp] the formula above can be deduced from |16j Appendix B. If X is a 
nondegenerate vector field and we define $x M as in , then for the vector 

field Y = -{VXy^X we have 

VY = (VX)-^ o{VVX)o{VX)-^X - I. (2.5) 
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Thus particular case of ()2.4|) we have 

i^xUiv) = jJo)(l) - (exp^UMv)) + (expp)*y,(6((VX)-i o (V.VX) o (VX)-iX)), 

(2.6) 

where X, VX, and VX are evaluated at p. 

The term {expp)^Yp{i-{v)) in ()2.(jp is itself the value of a Jacobi field, namely J^q ^^(1) 
where w = Yp. Hence fl2.6|) can be written as 

i^xUiv) = Jl{l) + {exv.UMZv)) (2.7) 

where = (VX)~^(V^VX)p(VX)~^Xp and where is the Jacobi field along 7^ 

with the "antidiagonal" initial conditions J(0) = — (VyJ)(0) = f. In Euclidean 
space this Jacobi field always vanishes at time 1, and (exp^),, is the identity after 
appropriate identifications are made as in Example ^ so that (as is well known) $x 
is a contraction if at each point ||X|| is small enough in terms of ||(VX)^^|| and 
II VVX||. In the general case we can again make ||(expp)*(i(Zp))|| arbitrarily small by 
taking ||Xp|| sufficiently small. Additionally, ||Xp|| small implies small, implying 
that the geodesic ■jYp is short. For sufficiently short geodesies, the map v || J^(l)|| 
is arbitrarily close to the corresponding map on Euclidean space, namely the zero 
map. (We will prove a stronger version of this fact in Lemma f2.3l below. ) Hence it is 
already clear that if sup^ ||Xp|| is sufficiently small on a set U, then {^x)\u will be a 
contraction. 

The essential ingredient in the preceding argument is that $x is a map of the form 
\E'y = exp oY for some vector field Y whose covariant derivative is close to minus the 
identity (pointwise) whenever ||y|| is small enough. (The prototypical example is the 
radial vector field — 2;*^ on R", whose covariant derivative is identically — /•) In 
computational situations it may be costly to invert VX, so we will analyze the more 
general maps ^'y, and deduce results for maps of the form as a special case. 

For some applications (e.g. those in jH]), it is useful to know the explicit de- 
pendence of our eventual contraction constants on background geometric parameters, 
so we keep track of this dependence carefully — leading unavoidably to longer formu- 
las than if we were only aiming at qualitative results. Certain special functions will 
appear, all of which are related to the analytic (entire) functions c, s defined by 

c(z)=y^- — -, s(^) = V- -. (2.8) 

^ ^ ^ (2n)V ^ ' ^ (2n + l)\ ^ ' 

n=0 ^ ' n=0 ^ ' 

Since the definitions and properties of the relevant functions are scattered through 
the text, for reference Table 1 lists the functions and the properties used. 

To estimate ||(\E'y)*||, we rewrite fl2.4|l as 

{^y\,v = + (expp),y^(6((VF|p + I)v)). (2.9) 
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Table of Special Functions 





U-tiililli^ iUi lliLllcl 


pi upei iieb ubeci 


c{z), z eC 


Er=o^V(2n)! 










X e [0, oo) 


\Z\X ) — S\^X ) 

= cosh X — x~^ sinh x 


mono. 1, 0_(x;) > 


(/)+(a;), X e [0,37r/4) 


( ^2'\ ( 

a^^ — X J — C^^ — X J 

= x;~^ sin a; — cosx 


mono, t, 0+(x) > 


Ci(A,r),A e R,r > 


1, if A > 

|;;i/2^ sif A<o 


mono, t in each variable, 
Ci(A,r) > 1 


/i(A,r), A e R, 

r [0,7r) if A>0, 
\ [0, oo) if A < 


c(-Ar2)/s(-Ar2) 


/i(0,r) = /i(A,0) = 1, 

/i(A,r) > if A < 0, 

or if A > and A^V < 7r/2 


h_{x) — h{—l, x), X & [0, oo) 


a; coth x 


mono, t) 

/i_(x) > /i_(0) = 1 


hQ{x) = h{0, x), X E [0, oo) 


1 




h+{x) = h{l,x), X e [0,7r) 


xcotx 


mono. J,, 

/i+(x) < /i+(0) = 1 


ip{X,r), same domain as h 


sign(A)(l-MA,r)) 


"^(A, r) > 0, mono, t 

in A and r, 

convex in each variable 


V'max((^,A,r), 5<AeR, 
r e [0, oo) 


max(V^(A, r), '^(5, r)) 


mono. 1 in A and r, 
mono. J, in 5, 

convex in each variable, 

V'maxl^, A, 0) = 



Table 1: In this table and throughout this paper our convention for hmctions that are given for 
a; 7^ by formulas such as "a;"^ sin a;)" are extended to a; = by continuity. When monotonicity or 
convexity of a multivariable function is stated with respect to one variable, the other variables are 
assumed fixed. 
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We will first analyze the Jacobi fields J^. 

Notation. For any subset U C M, let A{U) and S{U) denote, respectively, the 
supremum and the infimum of the sectional curvatures of {U,g\u)] let ji^'Kf/) = 
max(|A(6'^)|, |5(f/)|). For a curve 7 we simply write A(7) for A(Im(7)), etc. Then 
we have the following proposition. The inequality ()2.1()j) below can be derived from 
Karcher's elegant (and more general) Jacobi-field bounds; see jTE] pp. 534-535, 539. 
However, for the special case ()2.10j) . we give a short, direct proof in the Appendix 
f ^7.1|) . In the second part of ^7.11 we show how the proof leads directly to ()2.12|) . 

Proposition 2.3 Let p G M, let 7 : [0, 1] — >■ M 6e a geodesic of length r starting at 
p, and for each v G TpM let J„ be the Jacobi field along 7 with the "antidiagonal" 
initial conditions ( jt;(0), (Vy j^)(0)) = {v,—v). Let denote the component of v 
perpendicular to y {0) . Then 



||X(l)||<0_(r|K|(7)^/^)||t;^|| (2.10) 

where 

, / N 1 / X sinh X . . X 

^_(x) = cosh(a;) . (2.11) 

X 

If M is a locally symmetric space of nonnegative curvature, and A(7)^/^r < 37r/4, 
this bound can be sharpened to 

||J.(1)|| <0+(rA(7)^/^)||t;^l| (2.12) 



where 



, smx 
(p+\x) = cosx. 

X 



(2.13) 



The "37r/4" in the locally-symmetric case can be increased to approximately .STvr 
(see the discussion of fl7.5p in ^T.ip ) but any instances in which rA(7)^/^ > 7r/2 are 
irrelevant for all uses in this paper. 

Turning our attention to the second term in (j2.9|) . we have 

||(expp),y^(.(Vr|, + /)i;)|| < ||(exp^),yj| ||(Vr|, + /)|| 



We recall the following terminology. 



Definition 2.4 The local injectivity radius at p G M is nnj(p) '■= sup{p | exp^ : 
{Bp{0) C TpM) ^ M is defined and is a diffeomorphism onto its image}; rinj(-) is 
a positive continuous function on M. For any subset U C M, we define r'inj(^) = 
infpg[7{rinj(p)}. When U = M this infimum is called the injectivity radius of {M,g). 
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Definition 2.5 A subset U G M is convex (respectively, strongly convex) if for all 
p,q & U (resp., for all p G f/, g G U) there is a unique minimal geodesic segment 7 
in M from p to q, and 7 — {g} lies entirely in U^. For each p E M we define the 
local convexity radius rcvx(p) := sup{p < rinj(p) | -Bp(p) is convex}; for f/ C M we let 
rcvx{U) = infpg[/{rcvx(p)}- Like the local injectivity radius, the local convexity radius 
of a point (or of a closed set) is always positive ([T^ Lemma 1.6.4). 

Convexity is relevant because we want to apply Lemma 12.21 and Theorem 12.11 to 
the case U C M. The lemma only gives us a contraction from the metric space {U, du) 
to (M, (ijv/)- However if U is convex then du = dM so that Theorem 12. II applies. 

For w G TpM with ||u'|| < r-^^^ij)) the norm of (expp)^,^^, can be bounded in terms 
of curvature and 

llexpp,^ II < Ci(5(7), \\w\\) 
where 7 is the geodesic from p with 7'(0) = 1 and where 

f 1, A>0 

(see ^S] estimate CI). Thus if the image of 7 lies in a set f/, and 

\\{e^vMM^y\v + I)v)\\<Cr{5{U),\\Y,\\) \\{VY + 1\ 

Assembling the pieces above, we have the following corollary. 

Corollary 2.6 Let (M, g) he a Riemannian manifold, p > 0, p G M, and B = Bp{p). 
Assume that p < r^vxiB) and that \K\{B) < 00. 

(a) There exists e > 0, depending only on r^^^^B) and the sectional curvature of 
{B,g), such that if Y is a vector field defined on B, with \\Y\\ < e and ||Vy + /|| < e 
pointwise on B, then Y has a unique zero in B, namely lim„^oo(^y)"'('?) for any 
qeB. 

(b) Let ki,k2 > 0. There exists e > 0, depending only on ki, k2,rcvxiB) , and the 
sectional curvature of{B,g), such if X is a vector field X satisfying ||(VX)^-'^|| < ki, 
||VVX|| < k2, and \\X\\ < e pointwise on B, then X has a unique zero in B, namely 
lim„^oo("^'x)"(g) for any q E B. 

Remark 2.7 We intentionally avoid assuming that that {M,g) is complete or has 
positive injectivity radius. In the application to the set of smooth points of the shape 
space E^, if n > 3 then (M, g) is a dense open subset of a non-smooth real algebraic 
variety (cf. j2]), hence neither complete nor of positive injectivity radius. However, 
any closed subset of M with positive injectivity radius will be complete. In particular 
this applies to the closures of all the balls considered in this paper. 

^In the differential geometry literature there is little consistency in the meanings attached to the 
terms "convex set" and "strongly convex set" . There is quite an array of criteria one can imagine 
demanding of a convex set; see Definition 13 . II for a few of these. 



(2.14) 

(2.15) 

^injip), then 
ll^ll- (2-16) 
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Corollary 12.61 follows immediately from the following more quantitative version. 

Theorem 2.8 Let U C M be connected and let \K\ = \K\{U),6 = 6{U). Define the 
functions 4>-{-) and Ci(-, ■) by ()2.11|1 and ()2.15|1 . Assume either of the following sets 
of hypotheses: 

Case 1. Y is a vector field defined on U and at each point of U we have \\Y\\ < 
eo < nnj(f^) and || VF + /|| < ei. Define \l/y = exp oY as in \2.'J^) . 

Case 2. fci, k2> X is a vector field defined and uniformly nondegenerate on U , 
and at each point of U we have ||(VX)^^|| < k]^^ , ||VVX|| < k2, and \\X\\ < e < 
fcirinj(f/). Define $x = exp o(-(VX)-i o X) as m ni\) . 

Then: 

(a) For all p E U, in Case 1 we have 

||(^y)*p|| < /t(*y) := ct>^{\K\'/h^) + Ci(5,eo)ei, (2.17) 

while in Case 2 

mxUW < ^^{^x):=<P^{\K\"\k^') + Ci{5,ek^')k2k^h. (2.18) 

In Case 1, let F = \l/y; in Case 2 let F = $x- In Case 1 (respectively Case 2) ifeQ, ci 
are small enough (resp., e is small enough) that k{F) < 1, then F : {U, du) (M, du) 
is a contraction with constant n{F), and therefore has at most one fixed point in U . 
IfU contains an open ball B = Bp{pQ) on whose closure the distance functions du-, dM 
coincide (a condition satisied by every subset of U if U is convex), and if 

||r(po)|| < (1 - ^^{F\b))p (m Case 1), (2.19) 

or 

||X(po)|| < (1 - k(F|b))A;iP (m Case 2), (2.20) 

then F : U ^ M has a unique fixed point, and this fixed point lies in B . Equivalently, 
the vector field Y in Case 1, or X in Case 2, has a unique zero in U , and this zero 
lies in B . Assuming i2.19\) or \2.2L\) as appropriate, F preserves B, and the fixed 
point is lim„^oo -^"(q') for any q & B. 

(b) If M is a locally symmetric space of non-negative curvature bounded above 
by A, then in (j2.17|) and (j2.18|) . we can replace the right-hand sides by the smaller 
bounds 

«:sym+(^y) := M^'^'^o) + (2.21) 
and Ksym+i^x) ■■= M^^^^^K^)+k2k^^e (2.22) 

respectively, provided A^^'^eo < 37r/4 in the first case and A^^'^ek^^ < Sn/A in the 
second. 
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Proof: (a) Case 1. The bound ()2.17|) follows from Proposition 12.31 and ()2.16|) . If 
k(\I'y) < 1 Lemma 12.21 implies that \E'y : {U,du) [M^du) is a contraction with 
constant n. To use the fixed-point theorem we need a contraction with respect to a 
single distance function. However, the assumption that du = dM on B implies that 
the restriction of \E'y to i? is a ^-contraction from {B^du) — ^ c^m)- As noted 
in Remark 12.71 the metric space {B,dM) is complete. Hence the result follows from 
Theorem O with U = B. 

Case 2. Letting Y = -(VX)'^X, ioi p e U we have ||Fp|| < ek^^ < rmj{U), and 
from (j2.5p we have ||(Vy + /)p|| < A;f^fc2- Hence Case 2 follows from Case 1. 

(b) This follows from (|2.12j) and the proof of (a). | 



Remark 2.9 In the bound ()2.18|) . as either e ^ or [i^l ^ 0, we have ^-di^l^/^e/^f "'^) 
and Ci((5, e/cf^) — > 1. Hence, as one would hope, for small e and for small \K\ the 
bound ()2.18j) is asymptotic to k2k^'^e, the well-known bound for the Euclidean case 
(see the discussion following ()2.7|) ). 

Remark 2.10 Theorem II .21 follows immediately from Case 2 of Theorem I2.8r a). 

3 Averaging Points in a Riemannian Manifold 

In its most elementary form, averaging is something that one does to a finite list of 
elements in a vector space. The average of a list {wi, . . . , Wm} in a vector space V 
can be uniquely characterized as that vector w G V for which 

m 

J2iwi-^) = 0. (3.1) 

1=1 

The "balancing property" (|3.1|) motivates the alternative term for the average, center 
of mass. If V is given any inner product then, using the the inner product to define 
a norm, the average above can also be uniquely characterized as 



w = that vector v which minimizes 




1=1 

(the "least-squares property"). 

Unlike the balancing property, which requires a linear structure on V, the least- 
squares property makes sense if V is replaced by any metric space. A Frechet mean 
of a finite subset of a metric space {A, d) is an element a & A at which the function 
p \—>- Ylq£Q d{q.iPY attains an absolute minimum. In general a Frechet mean need not 
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exist or be unique, but when it exists uniquely it is not unreasonable to call it the 
average of Q. 

Modulo existence and uniqueness, Frechet means give a way to extend the notion 
of "average" to finite lists of points in a Riemannian manifold, or more generally 
probability distributions on such a manifold. This idea of the Riemannian center 
of mass dates back at least as far as E. Cartan [3, in the case of simply connected 
manifolds of nonpositive curvature; in this setting the Frechet mean of any probability 
distribution exists uniquely. However, the arbitrary-curvature case seems not to have 
been studied systematically until the 1970's, when it was investigated principally by 
Karcher and Grove ( dEl 13 IE] ; see also [Hj §§4-5). 

Unlike in Euclidean space, on a general Riemannian manifold it is clear that some 
restriction on the set of points to be averaged is necessary; for example there is no 
reasonable way uniquely to define the average of antipodal points on a sphere. Aver- 
aging can be done sensibly only on sets satisfying some suitable convexity condition 
(of which there are several). One notion of convexity was given in Definition 12. 5| some 
other relevant notions are given below. The reader is warned that different authors 
attach different names to these notions. 

Definition 3.1 Let U C M. We call U 

• self-visible if any two points of U can be joined by at least one geodesic, not 
necessarily minimal, lying in U; 

• simple if for any two points in U there is exactly one connecting geodesic lying 
in U; 

• solipsistically convex if for any two points p,q & U there exists a connecting 
geodesic in U whose length is minimal among all connecting arcs lying in U 
(hence of length du{p, q)). 

A function / defined on a self- visible set U is called (strictly) convex on U if its 
restriction to every geodesic in [/ is a (strictly) convex function of the arclength 
parameter. 

If / is then a sufficient condition for / to be convex on U is that its covariant 
Hessian be posit ive-semidefinite on f/; strict positivity implies strict convexity. 

Definition 3.2 (cf. p. 3) An open ball B = Bp{p) is a regular geodesic ball if (i) 
P < nnj(p), and (ii) 

p-max(0, A(5))^/2 < 7r/2. (3.2) 
For p a M define the regularity radius 

Tj-egip) '■= sup{p I Bp{p) is a regular geodesic ball} 
and the regular convexity radius 

^regcvx 

(p) = min(rreg(p) ip))- 
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For regular geodesic balls one has the following theorem of Jost |T3]; see [T^ 
Theorem 5.3 and ^H] Theorem 1.7. 

Theorem 3.3 Let B be a regular geodesic ball in a complete Riemannian manifold. 
Then B is simple and solipsistically convex, and geodesies in B contain no pairs of 
conjugate points. 

Completeness of the ambient manifold is not essential in Theorem 13.31 if i? = 
Bp{p), it suffices that exp^ be defined on the closed ball of radius p centered at 
G TpM. The example of an open ball of radius tt in the unit circle shows that a 
regular geodesic ball need not be convex. More generally Theorem 13.31 implies that 
regular geodesic ball B G M is convex if and only if the distance functions and 
dM coincide on B. 

There are various relations among rinj,rcvx, and rreg; we mention only a few. By 
definition, 'rinj(p) is the largest of the three radii at p. If M is complete and has 
constant positive curvature, then Bonnet's Theorem Theorem 1.26(2)) implies 
that rcvx(p) < rj.eg{p). But in general, a geodesic ball can be convex but not regular 
(see ^5 example), or, as the circle example shows, regular but not convex. 

Notation. If g G M can be joined by a unique minimal geodesic, we denote by 
exp~^(g) the unique pre- image of q (under exp^) of smallest norm. 

Now let Q be an arbitrary subset of a convex set U C M, and let /i be a probability 
measure on Q. For each p & U define 



More properly these objects should be subscripted with the pair {Q,fi). However, in 
most of our results n enters primarily through the geometry of Q rather than in the 
behavior of /i on Q. To emphasize this we will stick to the imperfect notation above. 

Definition 3.4 Let U C M he convex. (1) Let Q C U, let p he a probability 
measure on Q, and define a vector field Yq by ()3.3|) . If Yq{p) = at a unique point 
p ^ U, we call p the (Riemannian) center of mass of {Q,n), relative to U. (2) Let 
Q = {q . . . , qm} he a finite list of points in U, let Q he the set of distinct elements 
of Q, let fi he the normalized counting-measure on Q, and define Yq as above. If 
Yq{p) = at a unique point p E U, we call p the Riemannian average of the hst Q, 
relative to U. 

We call a point a center of mass of (Q,/i) (respectively, a Riemannian average of 
the list Q) if it is the center of mass of (resp., Riemannian average) relative to some 
convex superset. 




Q 



(3.3) 




(3.4) 
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For a finite list Q, the definition of Riemannian average relative to U is simply 
the zero (assumed unique in U) of the vector field Y = Yq on U defined by Y{p) = 
^ Z)iexpp^(gi) e TpM. Since Z]i(exp~H?i) - ^p) = 0^ heuristically, Y{p) represents 
"balanced" average of the points qi as seen from p. Alternatively, we can define 
fq-.U^ H, fgi^p) = ^Yl^id-iliyPY: and assume that /q is minimized uniquely 
at q E U. The Gauss Lemma ( 4j p. 8]) implies that grad((i(g, ■))|p = — expp"'^(g), so 
grad(/g) = —Yq, implying that Yq has its zero at q. Hence Definition 13.41 extends 
both the "balancing" and "least-squares" properties of the Euclidean average. 

Remark 3.5 Definition 13.41 generalizes easily to a solipsistically convex or simple 
set U. In this case denote by exp~^''^(g) that pre- image f of g (under exp^) of 
smallest norm for which expp(tt>) G f/, < t < 1. In (j3.3p we can replace exp~^(g) 
by exp~^''^(g), and d{p,q) by || exp~^'^(g)|| (in the solipsistically convex case this is 
just du{p,q)). With these replacements it is still true that grad(/Q) = —Yq, but the 
interpretation of Yq{p) as an average of points as seen from p is less compelling. 

We will refine Definition 13 .41 later for a case in which one center of mass is singled 
out, allowing us to dispense with the awkward "relative to U" fDefinition I3.12p . 

Following 123 121]; for example, we will call any relative minimum of fq a 
Karcher mean. Thus a Frechet mean is necessarily a Karcher mean, but, absent extra 
hypotheses, not vice-versa. A center of mass of (Q,/i) under Definition 13.41 is simply 
a Karcher mean that lies inside some convex superset of Q. 

Karcher proves a somewhat more general version of the following theorem ( [TB] 
Theorem 1.2, Definition 1.3, and Theorem 1.5). 

Theorem 3.6 (Karcher) Let {M,g) be a Riemannian manifold. Assume that Q C 
B C M, where B = Bp{po) is a strongly convex ball. Let A = A(i?) be the supremum 
of the sectional curvatures in B. Then, with fg and Yq defined as above, 

(a) grad(/Q) = -Yq. 

(b) The function /q achieves a minimum value on B, and hence Yq has a zero in 

B. 

(c) If p ■ max(0, A(i?))^/^ < n/A, then the minimum of fq on B is achieved at a 
unique point q, and for any point p E B we have 

*.5)<r«wii-{f'^''^' lill (3.5) 

where, for A > 0, h{A,x) = A^/^x cot(A^/^x). 

In PU], Karcher defines the center of mass to be the location of the minimum of /q 
on Bp. However, his proof of existence and uniqueness of the minimum also implies 
uniqueness of the zero of Yq, so under the hypotheses of Theorem 13.61 this definition 
coincides with ours; indeed, the geometric Definition 13.41 is the one used in 0. 
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Note that the ball B in Theorem 13.61 is both geodesically convex and regular. 
If p < ^rreg(po) then the requirement on p in (c) is automatically satisfied; hence 
the upper limit on the radius of the balls for which part (c) is applicable is at least 
min(|ri.eg(po)5 ''^cvx(po)) but no greater than ri.egcvx(po)- In Theorem 7.3, W. S. 
Kendall strengthened the uniqueness assertion^ in Theorem I3.(jr c): 

Theorem 3.7 (W. S. Kendall) A mass distribution supported in a regular geodesic 
ball B has at most one Karcher mean in B. 

In other words, as far as the uniqueness statement is concerned, as long as we 
assume p < rinj(po) Karcher's 7r/4 can be replaced with 7r/2, and the ball -Bp(po) need 
not be assumed convex. 

In general, Karcher means are not unique in the large, cf. [HI Ell- -^^^ example, 
given a set Q of two equally- weighted points in the unit circle 5*^, the midpoints of each 
of the two arc joining the points is a Karcher mean. The statistically-natural absolute 
minimization of /g of course distinguishes one of these midpoints as the preferred one. 
However, we suggest an alternative, purely geometric way of distinguishing one of the 
Karcher means from the rest: just as in Euclidean space, the center of mass of a 
distribution p should be in the convex hull, suitably defined, of its support — the 
average of a set Q should be not only near Q, but "within" Q. In the example 
above, unless the two points are antipodal — in which case the convex hull is not 
defined — only one of the two midpoints meets this criterion. Thus in this example 
the convex-hull and global-minimization criteria coincide, but the author does not 
know to what extent these criteria overlap in general. 

The definition of "convex hull" varies in the literature. The notion best tailored 
to our needs is that of the o-hull defined below. 

Definition 3.8 Call a. set Q G M hulled if it is contained in some convex set, and 
o-hulled if it is contained in some open strongly convex set. If Q is hulled (resp. 
o- hulled), define the convex hull of Q (respectively, the convex o-hull of Q), written 
hull(Q) (resp., ohull((5)) to be the intersection of all convex sets (resp. open strongly 
convex sets) containing Q. We will usually refer to these objects just as hulls and 
o- hulls. 

Note that if a set is hulled, then the minimal geodesic between any two of its 
points exists and is unique. 

Obviously hulls and o-hulls, when they exist, are convex sets, and hull(Q) C 
ohull(Q). The o-hull may fail to exist even when the hull exists (example in S^: a 

^Kendall's proof does not yield existence of Karcher means as we have defined them. It is clear 
from the context and the proof that the existence asserted in the theorem as stated in ^H] is the 
existence of a "solipsistic Karcher mean", in which the distance function d = cIm in l|3.4(l is replaced 
hy ds- The existence argument requires grad(/Q) to be outward-pointing on the boundary of the 
ball, which is guaranteed only under the solipsistic interpretation of /q. 
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semicircle closed at one endpoint and open at the other). However in R", at least, 
the differences between hull and o-hull are minor: one always has 

hull(Q) C ohull(g) C hull(g) (3.6) 

(both inclusions can be strict; see [ID])- Conceivably ()3.6|) holds generally for o-hulled 
sets in Riemannian manifolds provided hull(Q) has compact closure. 

All sets Q of interest in this paper are contained in a convex open ball and so 
are o-hulled. As noted above, we will use o-huUs to distinguish one particular center 
of mass. Neither Karcher's theorem nor Kendall's generalization, as stated, immedi- 
ately eliminates the unsettling possibility that Q could be contained in two different 
convex regular geodesic balls, and that Yq could have two zeroes (each of which could 
even be an absolute minimum of /q), each contained in one ball but not the other. 
However, the proofs in and ^H] imply more than is explicitly stated in either pa- 
per, and a minor extension of an ingredient of these proofs shows that this unwanted 
phenomenon cannot happen^. We give this extension in Lemma (3.101 and Corollary 
13.111 The corollary leads us to the convex-hull criterion in Definition 13. 121 below. 

While ohull(Q) is the smallest set we can construct naturally from the family of 
open strongly convex supersets of Q, the largest set we can construct from this family 
also has relevance: 

Definition 3.9 For any o-hulled set Q C M, define star((5) to be the union of all 
open strongly convex supersets of Q. Analogously, define regstar(Q) to be the union of 
all regular geodesic balls containing Q. Note that star(Q) depends only on ohull(Q). 

Given an open set U (Z M and a boundary point p G dU, call a tangent vector 
V G TpM outward-pointing for U if v ^ and if for some curve in M with 
7'(0) = —V we have 7((0, e)) C U for some e > 0. 

For reference, we record the following obvious facts (proof left to the reader). 

Lemma 3.10 Let U G M be an open self-visible set with U compact, and let f be a 
function defined on some open neighborhood of U . 

(a) //grad/ is outward-pointing at each point of dU , then f\jj never achieves its 
minimum at a point of dU, and hence achieves it at some critical point q E U. 

(b) If f is convex on U then the critical points of f in U, if any, are global minima 
of flu . If f is strictly convex on U then there is at most one critical point. | 



^|23j uses a different partial solution to this problem: if in Karcher's theorem it is additionally 
assumed that 2p < rinj(po) and hypothesis (c) is satisfied with A{B) replaced by A(i?2p(po)) — then 
Kendall's theorem implies that the Karcher mean of {Q, ii) in Bp is the unique Frechet mean of 
Our alternative approach does not require this extra hypothesis in order to single out a 
"best" Karcher mean, but our geometric definition of "best" differs from the statistical definition. 
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Corollary 3.11 Let Q C M. Suppose that f : M ^ H is on an open neighbor- 
hood of sta.i{Q) and that for every open strongly convex superset U D Q, the gradient 
of f is outward-pointing along dU. Suppose that there exists an open strongly convex 
superset Ui D Q of compact closure for which f\ui achieves a minimum at some point 
q, and that q is the unique local minimum of f in Ui. Then q G ohull(Q) and is the 
unique local minimum of f in o\m\\{Q). IfU is any collection of supersets of Q on 
each of which f has a unique local minimum, then q is the unique local minimum of 

f Uueu U ■ 

Proof: Let U D Q he open and strongly convex, with U compact. Then Uf]Ui is 
an open strongly convex superset of Q, so V/ is outward-pointing along d{U fl Ui), 
and Uf]Ui is compact. By Lemma (3. 101 f\unUi achieves a minimum at some point 
q. But q is the unique local minimum of / in Ui; hence q = q, so q ^ U for every 
open convex superset of Q. | 

In the case of our functions fg, the key point is that if U is an arbitrary open 
strongly convex superset of Q, then from (j3.3p the vector field Yq is inward-pointing 
along dU, so grad(/(5) is outward-pointing and Corollary 13.111 applies. Thus, while 
strongly convex or regular geodesic balls are essential to the proofs of Karcher's and 
Kendall's theorems (as well as to the proof of Theorem 14.81 in this paper), once one 
has existence and uniqueness within even one bounded strongly convex open ball, 
balls can essentially be dispensed with in favor of general strongly convex open sets. 
This allows us to frame our desired characterization of the center of mass, or average. 

Definition 3.12 If {Q,fi) has a unique center of mass q in ohull(Q), we call q the 
primary center of mass, or simply the center of mass, of {Q,fi). If Q is a finite list of 
points and fi is the normalized counting measure, we also refer to the primary center 
of mass as the (Riemannian) average of Q. 

Thus, combining Theorems 13 . fjl and 13 . 71 wit h Corollary l3.1H we have the following. 

Corollary 3.13 Suppose Q G M is contained in a strongly convex regular geodesic 
ball. Then for any probability distribution /i on Q, the primary center of mass q of 
{Q, ii) exists, lies in ohull(Q), and is the unique Karcher mean of {Q, fi) in regstar(Q). 
If fq has a local minimum at q, then the restriction of fq to regstar(Q) achieves its 
absolute minimum at q and nowhere else. | 

In particular, Karcher means given by any two balls containing Q in Karcher's or 
Kendall's theorem coincide. 

Note that regstar(Q) can be much larger than any single regular geodesic ball. 
For example, let M be the unit sphere S"^. Let Q C S*"" be a set of two non-antipodal 
points, let C be the minimal arc joining the points, and let Copp be the arc antipodal to 
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C. Then regstar((5) = star((5) = S*" — Copp. In this and some other obvious examples 
on spheres, regstar((5) coincides with IC{Q):= the largest open superset of Q that 
does not meet the cut-locus of any point of hull((5). It is plausible that in general 
regstar((5) C IC{Q). However an example in shows that in general regstar((5) in 
Corollary IH. 1 HI cannot be replaced by IC{Q) in general without sacrificing uniqueness. 

It is plausible that Corollary 13.131 remains true with "ohull" by "hull" , but the 
author has not found a proof. However, Cheeger and GromoU's general structure 
theorem for convex sets ([3] Theorem 1.6; note that our "convex" is Cheeger and 
Gromoll's "strongly convex") shows that hull((5) has a well-defined dimension. Only 
if this dimension equals dim(M) is our definition of ohull exactly what is needed for 
the given proof of Corollarv l3.13l However, Corollaries l3 .111 and 13 . 1 31 can be sharpened 
to include the case dim(hull((5)) < dim(M); see [TU] (the original preprint version of 
this paper, available from the author). 

4 Constructing the primary center of mass 

The methods of §2 allow us to give a constructive proof of a version of Theorem 13.61 
This section is devoted to the proof and a discussion of the consequences. Throughout 
we assume that the set Q lies in a strongly convex ball B. 

The vector field Yq on B gives rise to a map "^q = "^Yq = exp oYq : 5 ^ M as in 
Section El To apply our contracting-mapping result. Theorem 12. 8^ we need bounds 
on IIVYq -|- /||. Heuristically it is easy to understand why this quantity is small, 
provided p is small enough. Let g"^ : T*M T*M T*M (g) TM = End(TM) 
be the isomorphism defined by using the metric to identify T*M with TM ("raising 
an index" on the second factor of T*M T*M). For any function / : M — > R, let 
Hess(/) = VV/ G r(Sym^T*M) denote its covariant Hessian, and let Hess'(/) = 
g-^(Hess(/)) G r(End(TM). From Theorem EBa) we have Vlg = -Hess'(/Q). In 
normal coordinates {x*} centered at a point q, for points near q we have 

Hess(^rg) = '^dx' <S) dx' ^ ^ gijdx' (g) dx^ = g, 

SO that Hess'(|rg) ^ g~^g = I near q. From [TB] Theorem 1.5 we have 

{VYq){p) = - [ Ress'i^rX dp{q). (4.1) 

Thus for general Q contained in a small set, at points near Q the endomorphism 
— VYq is an average of endomorphisms close to the identity, and hence is close to the 
identity. 

A quantitative bound on HVYq + /|| can be obtained in terms of the functions 
Hq defined by 

h+{x) = xcotx (0 < X < vr only), ho{x) = 1, h^{x) = xcothx. (4.2) 
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The function is monotone decreasing (hence < 1), while /i_ is monotone increasing 
(hence > 1). Define 

MA,r) = /^sign(A)(|Ar/V) = ^|^, (4.3) 
tp{\,r) = sign(A)(l - /i(A,r)). (4.4) 



Then h is an analytic (entire) function of Ar^, with h{X,r) = 1 — |Ar^ + 0((Ar 
For every A the function r i— ipi^, r) is nonnegative, monotone increasing on [0, vr) if 
A > and on [0, oo) if A < 0, and ip{X,r) = l\X\r^ + 0(AV). For 5 < A e R and 
upper limit on r applying only if A > 0), define 

V'max(5,A,r) = max(^/'(A,r),V(5,r)) (4.5) 

= -\K\r^ + 0{\K\\^) (4.6) 
3 

where \K\ = max(|5|, |A|). Note that ^/^max is monotone increasing in A and r, 

2 I 2 CSC'^ T I 

monotone decreasing in 5. Observing that -^-^^^(±1, r) = i ^ ,2 } " ^^(il; ''^) > 0, 



dr^^^ ' > \ 2csch^r ^ 
it also follows that V'max is a convex function of each argument with the other two 
held fixed. The relevance of V^max is in the following lemma. 

Lemma 4.1 Let p,q & M with d{p, q) < Tinj(g) and let 5 and A he lower and upper 
hounds, respectively, for the sectional curvatures of M along the minimal geodesic 
from q to p 'j; if A > also assume d{p, q) < vrA"^/^. Then 



||Hess'(-r^^) -/||(p) < ^^,.(5, A, ci(g,p)). (4.7) 
Ifd{p,q) ■max(0,A)i/2 < n/2, then 

Hess(^r,2)|,>0. (4.8) 
Proof: Both statements follow immediately from Lemma f7. II in the Appendix. | 



Henceforth we assume that Q lies in a ball Bo{po) and analyze the vector field 
Yq on a possibly larger concentric ball B = Bp{pQ), still assumed strongly convex. 
We apply the lemma to points p E B,q E Q, setting 6 = 6{B),A = A{B). For 
such points we have d{p, q) < p + D, so to meet the potential restriction on d{p, q) 
in the lemma, we assume that (p + D) max(0, A)^/^ < vr. From ()4.1|) . ()4.7|) . and the 
monotonicity of "i/^max we then have 

||VrQ + /|| = || I (Hess'(^rJ)-J)rfMg)ll<V'max(^,A,p + I5). (4.9) 

J Q 
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We also have 

\\YQip)\\ <snp d{p,q)<p + D. (4.10) 

Hence from Theorem 12. 8[ for all p G -B we have 

mQ).p\\<K{po;p,D) := MiP + D)\K\'^')+Ci{S,p + D) ij^US,A,p + D). (4.11) 

where \K\ = \K\{Bp{pQ)), and where the choice of sign in (p^ is governed by the 
following convention. 

Notation Convention 4.2 For the remainder of this paper, when an expression of 
the form (f)±{x) appears, (p+ix) is to be used if M is a locally symmetric space of 
nonnegative curvature and x < Sn/A; (p^^x) is to he used otherwise. 

To ensure that \I/q is a contraction we want k,{pq\ p, D) < 1, which will be true for 
small p since (t>±{x) and '?/'max('5 -^x) are Oi^x^). This is not enough by itself to ensure 
existence of a fixed point: 

Definition 4.3 Call a map \1' : (domain(\l/) C M) — * M tethered to Q if, for every 
strongly convex regular geodesic ball B containing Q, (i) \1/ is defined on B and (ii) 
$(fi) C B. 

If we knew \I'q to be tethered to Q (which implicitly requires domain(\l/) D 
regstarQ), we could apply the general form of the Contracting Mapping Theorem 
(which assumes a priori that the contracting map preserves its domain) to conclude 
that \I/q has a unique fixed point in Bplpo) as long as n{po] p, D) < 1. In Euclidean 
space, \1/q is always tethered to Q trivially: \I'q maps the entire space to a single 
point contained in the convex hull of Q. On a general manifold, if Q consists of a 
single point then \I'q is tethered to Q for the same trivial reason. Thus it seems likely 
that on general M, tethering will occur provided diam((5) is sufficiently small. It is 
plausible that this happens for any Q contained in a strongly convex regular geodesic 
ball, but the author has neither a proof nor a counterexample. The lack of such a 
proof is the sole reason that in our center-of-mass application we use Theorem 12.11 
(in the guise of Theorem 12. 8p rather than the more general Contracting Mapping 
Theorem (but note that Theorem 12.81 mav still be needed in other applications, i.e. 
those using maps with Y not of the form Yq, since most such general maps will 
not be tethered). The cost is that the upper bound on the diameter of Q (or other 
measures of size such as the "circumradius" ) for which we can ensure that "^q has 
a fixed point is smaller than it would be if we knew that tethering occurred. Since 
it may be possible to prove tethering, either in general or in specific cases, in the 
remaining theorems of this paper we include statements of what one can conclude in 
the tethered case. 
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Assuming k{po; p, D) < 1, to conclude from Theorem 12.81 that \1/q has a fixed 
point, we additionally need to have 

||Fq(Po)|| < {l-t^{po\P,D))p:=s{p,-p,D). (4.12) 

Clearly 1)4.101) is of no help here. However, the left-hand side of ()4.12|) does not depend 
intrinsically upon p, but only upon {Q,p). We are taking p > D, so furthermore 
s{po] p, D) > s{po; p, p) := S2{po;p). The basis of the argument over the next few 
pages is simply that as long as ||5^q(po)|| is less than the maximum value of the 
function S2{po', ■), there will be some radius p for which ()4.12|) is satisfied even with 
D = p, hence for all D < p as well. 

Note also that ||Fq(po)|| < -D, so that an upper bound on D implies an upper 
bound on i|5^Q(po)il- Thus the most general conclusions we eventually draw will be 
those that have an upper bound only on ||yQ(po)|| (hence on {Q,fj,)) as a hypothesis, 
but as a corollary all such conclusions hold with an upper bound on D, a more easily 
checked and therefore more practical hypothesis. Eventually in Corollary 14.111 we 
will take po to lie in Q, which will give us even more control since we can then take 
D = diam((5). 

Since we are interested not just in the existence of "good" radii p and D, but 
on estimating their size, we first prove a lemma establishing some properties of the 
function s; these will be used to estimate the size of balls on which \1/q has a fixed 
point. In practice one is usually not presented with an explicit growth rate for \6\, |A|, 
or \K\ as functions of p in ()4.11|1 . so we also examine the consequences of a (potentially 
less sharp but usually more practical version of the bound in ()4.12|1 . replacing the 
function s by a function s defined below. The sharp bounds, however, are needed for 
the best estimates in JT] for an averaging algorithm on size-and-shape spaces. 

Definition 4.4 Let p e M. (a) For < < p < rreg(p), let Ap^p = A{Bp{p)), 
6p,p = 5{Bp{p)), \K\p,p = \K\{Bp{p)), and 

k{p; p, D) = Mip + D)\K\l/;) + Ci(5p,„ p + /^)^„,ax(5p,p, Ap,„ p + D),(4.13) 
s{p-p,D) = {l-K{p;p,D))p. (4.14) 

(If Sp = —oo interpret ()4.13|) as k(p; p, D) = oo.) 

(b) Let ri e (0,rreg(p)), and let A(-) (respectively 6{-)) be any continuous mono- 
tonically increasing (resp. decreasing) function on [0, ri] such that Ap^ < A(p), 6p^p > 
6{p), n ■ max(0, A(ri))i/2 < 7^/2. For < D < p < ri define k{p, A, 6; p, D) to he the 
right-hand side of ()4.13|) with Ap^p, 6p^p, \K\p^p replaced by A(p), 5(p), max(|A(p)|, |5(p)|) 
respectively, and define 

s{p, D) = 5(A, ~5- p, D) = {1- k{K, ~6; p, D))p. (4.15) 
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In practice, A and S will usually be constant functions, global upper and lower 
curvature bounds on Br^ij)). We define s in greater generality above because this 
enables not only stronger results, but shorter proofs: anything proven for the more 
general functions s applies to the special case s = s. 

We construct from such a function s several numbers and functions of D: -Dcrit, -Dma: 
and pi, all defined below. The meaning of the pi{p,ri;D) is indicated by the pi in 
Figure 1; the qualitative correctness of Figure 1 is proven in Lemma f4. 51 



Figure 1. A (not-to-scale) sketch of s{p,D) versus p for some fixed D < Ucrit, assuming ri > p4. 
A sketch of s{p,D) for a fixed D < i?crit would be similar, with the pi replaced hy pi,l < i < 4. 
-Dcrit is the maximum value of s{p, Dcrit); for D > Dent, the graph of s{p, D) lies entirely below the 
horizontal line at height D. To illustrate the maximum number of distinct radii we have sketched 
the case in which p4 is strictly less than ri, i.e. in which k{p,D) reaches 1 before p reaches ri. 
The picture for smaller ri can be obtained from this one by moving ri to the left, say to ri,acw, 
truncating the diagram to the right of ri^ncw and decreasing D, if necessary, to keep it less than 
the maximum value of s on [D, ri now] (hence keeping ri^new > Pi(Dnew)). If any of P2, Ps, Pi in the 
picture above is to the right of ri.ncw, the corresponding pi^ncw is defined to be ri_now 

The definitions of Dcrit{p,f^i), D^i^^{p,ri) and the Pi(p, ti;-) are given in (|4.16l - 
I4.18P and (j4.2Ufl¥!^^ below. Here and below we suppress the parameters A and S 
rather than write Dcrit (p, ''"i, A, 5) etc.; these parameters are always present implic- 
itly. For the sharp-curvature-bound case {§ = s), we omit the tildes and just write 
-Dcrit (p, ''^1)5 -Dmax(P5 ''^i) and pi{p,ri). Since «;(■,■, 0,0) = 0, the sets over which the 
suprema are taken below are nonempty and the suprema well-defined. 





(4.16) 
(4.17) 



SUpjAnaxlP,?^!) | ^ < '^rcg(p)}- 
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^crit(p, ri) = sup{D e [0, n] I 3p E [D, n] for which s(p, D) > D}. (4.18) 
Dcv\t{p) = sup{D e [0,r,eg{p)) I 3p G [D,rreg(p)) for which s{p, D) > D} 

(4.19) 

= sup{L'crit(p,'^i) I ri < rrcg(p)}. 



For < D < Drai,Ap,ri), define 

P4(p,ri;D) = sup{pG [Z},ri] I /i(p,D) < 1}; (4.21) 
for < D < Dijii^^{p) define 

p,ip;D) = sup{p G [/^,r,eg(p) | ^ip, D) < 1}. (4.22) 
For < D < Drrit define 



P3(p, ri; D) = sup{p G [0, n] | 5(p, D) > D}, (4.23) 
pi(p, n; D) = inf{p G [0, n] I 5(p, /}) > D}; (4.24) 

for < -D < -Dcrit define 

Ps{p;D) = sup{pG[0,r,eg(p))|5(p,/^)>/^}, (4.25) 
Pi(p;D) = inf{pG [0,r,,g(p)) I 5(p,D) (4.26) 

Note that for i = 1,3,4, pi{p; D) can ahernatively be written as a supremum 
or infimum (over ri) of pi{p,ri;D) as we did above for -Dmax(p) and Dcrit(p)- Note 
also that in KWi and (IOT|l . "k(-, ■) < 1" can be replaced by "s(-, ■) > 0" without 
altering the definitions of -Dmax and p^. 

The technical lemma below establishes some useful properties of the objects just 
defined, including monotonicity in parameters. 



Lemma 4.5 Let p E M and let ri G {0,r^cg{p))- Let A, 6 be continuous mono- 
tone bounds on curvature as in Definition \4.4][ b) , and let -Dcrit = -Dcrit (p, '"i), -Dmax = 
D'max(p,n), and pi{-) =pj(p,ri;-) be as m U.l(^4.24\) . 

For D G [0,ri] let = {p e [0, ri] | D < s{p,D)}. For each D, the set 
Jd is either empty or an interval with endpoints pi{D)^p^{D). If D2 > Di then 
Jd2 C Jdi, so {D I Jd 7^ 0} is an interval whose right endpoint is -Dcrit- -Dcrit > 
and rio<D<-D„it consists of a single point p^rit, satisfying D„\t = s(pcrit, -Dcrit) = 
maXpgp n] ■s(P) -Dcrit), the maximum being achieved uniquely. The following are true: 
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1- -Dmax > -Dcrit, with equality if and only if Dent = ri. 

3. Pi{D) > p^{D) for all D < D^^x- 
I (2D^ax) ■ max(0, A(D^,J) < ,r/2. 



5. For each D G [0,ri], the function p t-^ k{p,D) on [D,ri] is continuous, mono- 
tone increasing, and convex. The function p \—>- K,{p, p) is 0{\K\p p^), where 
|i^|, = max(|A(p)|,|5(p)|). 

6. For each D G [0,ri], the function s{-,D) is concave and achieves its maximum 
at a unique point P2{D) G (0,ri]. 

For each D < -Dcrit the following are true, where pi = Pi{D). 



7- P3 > P3, Po < Po, and pi < pi. 

8. The following order-relations hold (cf. Figure 1): 

D < Po < Pl < Pcrit < P3 < P4 < Dmax < ^i- (4.27) 



9. (p4 + £') ■max(0,A(p4))^/2 < 7r/2. 



As a special case, all conclusions above are true with the tildes erased. As a corollary, 
conclusion^is true also with D^s,^{p; ri) replaced by -Dmax(p); conclusions\^and\Mare 
true with the tildes erased and with [D, ri] replaced by [D, Vj-cglp)); and conclusions^^\^ 
are true with Z)crit(p; '^i) and Dmnxip, 1^1) replaced by Dcrit(p) and D^g_^{j>) respectively, 
Pi{p,ri;D) replaced by pi{p;D), and "p4 < ri" replaced by "p^ < r^^^ij))". 

Proof: From the definition of k continuity in all parameters is clear, and it is easy to 
check that k{p,D) < k{p, p) = 0{\K\p p^). We have already noted that ipuia^i^, A,r) 
is monotone increasing in r and A, decreasing in 6, and convex in each variable 
separately; the same is true of Ci{6, r). The functions (j)± are monotone increasing and 
convex. Monotonicity and convexity of (j)±,H, and Ci are retained after composition 
with the monotone functions 6{p), A. 

It follows that with D held fixed, k{-,D) is continuous, monotone increasing and 
convex, and hence that s{-,D) is continuous, concave, and, because of the factor of 
p in ()4.15p and monotonicity, nonconstant on any interval of positive length. Since 
k{p,0) = 0{p'^), s(p, 0) > for p > sufficiently small. Hence Jq is nonempty, and 
by continuity so is Jd for sufficiently small positive D. Hence Dcrit > 0. 

For each fixed D, the concavity and local nonconstancy of the function s{-,D) 
implies that its maximum value Pc{D) on [0,ri] is achieved at a unique point p2{D), 
and for any a < Pc{D) the set {p G [0, ri] \ s{p) > a} is an interval; in particular each 
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set Jd is an interval. Since D2 > Di implies s(p, D2) < s{p, Di) strictly for p > 0, the 
asserted nesting of the intervals Jr, also follows. The intersection of the nonempty Jo 
is nonempty because their closures are nested, and the intersection has only one point 
Pcrit since s(-,Dcrit) is nowhere constant. Continuity implies Dent = S(Pcrit, -Dcrit)- 

From its definition clearly s(p, D) < p. All the inequalities asserted in statement 
ISl follow immediately from the foregoing, except for po ^ Pi- The latter inequality 
follows from chasing through the definitions and monotonicity of the ingredients in 
k. A helpful observation is that from ()4.13p we have 

k{p,D)>^^^Mp),A{p),p + D)>^lj{A{p),p + D). (4.28) 

It also follows that s(Z)crit, -Dcrit) > S(pcrit,-Dcrit) = -Dcrit > 0, SO that Z)max > -Dcrit- 

The monotonicity of 0±, Ci, and V'max imply that if p < ri, then ^(p, D) < k{p, D), 
and hence s(p, D) < s(p, D). Hence Dcrit < -Dcrit, Pi > Pi, and pi < pi for i = 3,4. 
To establish statements |3 and El we claim first that for D < -Dcrit we have 

(pi(D) + D) max(0, A{pi{D))Y/^ < 7i/2. (4.29) 

This is true for D = 0, so if it is false for some D < Dent then there exists D G (0, -Dcrit) 
for which K{pi{D)) > and (pi(L') + D)A{pi{D)Y/'^ = 7r/2, the latter implying 
'ijj{A{pi{D)), pi{D) + D) = 1. But the combination D > 0, A > implies strict 
inequality in ()4.28|1 . so k{pi{D),D) > 1 and s{pi{D), D) < 0; but from the definition 
of pi we have s{pi{D), D) > D. Hence fl4.29|) holds for all D < D^nt- Therefore if 
statement ini is false, there exists p G {pi{D),pi{D) for which (p + D)A(p)-^/^ = 7r/2. 
From ()4.28|) we again conclude that k{p, D) > 1, and since p > pi(-D) > this implies 
the strict inequality s{p,D) < 0, a contradiction since p G (0,p4(D)). This proves 
statement IHl a shorter version of the same argument yields statement IH | 



Remark 4.6 In Definition 14.41 and Lemma f4.5| the restriction "ri < rrcg(p)" can be 
replaced by the less restrictive "ri ■ max(0, Ap^^J^^^ < 7r/2". 

Coro ll ary 4 .7 Let po e M, < ri < rregcvx(po)- Let -Dcrit, -Dmax, P4, Pi be as in 
(|y/.i?| )-( |^.i^6| ). For < p < ri write Bp for Bpipo). Let Q C Bp^ he equipped with a 
probability measure p, and define Yq and fq by \3. '^3.4^ . Then Yq has at most one 
zero in Bp^ (equivalently, fq has at most one critical point in this hall); at such a 
zero fq achieves its minimum value on Bp^ (in fact, on regstar(Q) ). If D < Dent and 
Q C Bd (or more generally if ||yQ(po)|| ^ D), then Yq has a unique zero q in Bp^, 
andq lies in Bp^. Hence {Q,p) has at most one center of mass in Bp^, and has exactly 
one center of mass in Bp^ if Q C -BDcrit ■ '^^ tethered to Q, these conclusions 

hold with Dcrit replaced hy the (never smaller and usually larger) numher -Dmax- 

We will prove this simultaneously with Theorem 14.81 below. But first, taking ri 
close to rrcgcvx(po) in Corollarv l4.7l note that Lemma l^TSl implies that the restriction on 
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the radius of the ball containing Q in Corollary 14. 71 is more stringent than in Theorem 
I3.6r c). Similarly Lemma 14.51 implies that the conclusion q G Bp^ in the corollary 
above is not as sharp as Karcher's conclusion q G Bp^ , and that the conclusion above 
concerning existence of at most one center of mass in Bp^ is weaker than Kendall's 
conclusion — at most one center of mass in Bt.^^^(pq) — which is itself weaker than the 
uniqueness and minimization statement in Corollary 13.131 (However, we will see 
in [jni that if (M, g) has non-negative curvature, then for D < -Dcrit the uniqueness 
statement in Corollarv 14.71 is actually stronger than Karcher's.) In fact, in view of 
Corollary 13.131 Bp^ can be replaced by regstar(Q) in the conclusions (but not the 
hypotheses) of Corollarv 14.71 

Thus, were Corollarv 14 . 71 the only outcome of the contracting-mapping approach, 
we would have gained little from it. However, the contracting-mapping approach ad- 
ditionally provides an algorithmic construction of the center of mass, one that is easily 
implemented in spaces for which the exponential map and its inverse are explicitly 
known, and in particular for shape spaces. In practice, any algorithm intended to 
average a list Q of points in a space is initialized at a point qo ^ Q, but there are 
questions of whether the algorithm converges and whether its limit (if any) depends 
on the choice of initial point. As mentioned in the introduction, CPA algorithms 
converge quite rapidly in practical applications, but it is not readily apparent why 
this happens. For a given algorithm, one may be able to prove initial-point inde- 
pendence of the limit by one argument, and convergence by another, and perhaps 
estimate the convergence rate still another way. However, the contracting-mapping 
approach allows one to answer all these questions at once (although answering them 
individually by other means may lead to sharper answers, as in ^23| Proposition 3, 
for initial-point independence in the GPA-S algorithm). Thus the added value of this 
approach lies in the following theorem, in which we state only those direct conclusions 
of the contracting-mapping approach neither contained in nor relying on Karcher's 
and Kendall's theorems (except for the use of po in conclusion E)) . In ^we will see 
that by estimating the convergence rate of \I/q(po) and combining this with Kendall's 
uniqueness result, we can considerably strengthen certain parts of Theorem 14.81 see 
Theorem 15.31 In statement |21 of the theorem below, note that with the indicated 
restrictions on D, existence of the primary center of mass is guaranteed by Corollary 
14. 7[ as well as by Theorem 13.61 

Theorem 4.8 Let pq G M, < ri < rrcgcvx(Po)/ for < p < ri write Bp for Bpipo). 
Let A(-),5(-) he continuous monotone upper and lower hounds on curvature as in 
Definition \4-4}f b )■ Let Q C Br^ he equipped with a prohahility measure p, and define 
fq by \3. ^J.^| ). Then, using the notation \4.1(^ - ^4.2b\l with the parameter po 
suppressed, the following are true. 

1- ^max(jl) < ^max(ri) < /^max and Z)crit('"l) < ^crit(?'l) < ^crit- In particular if 

D < Dcriti^i) then all the pi{D) are defined, and 

D < poiD) < pi{D) < perit < PsiD) < p^{D) < D^ax(ri) < n (4.30) 
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where pcrit is value of p that maximizes s(p, Z^crit)- 

2. For all D G (0,ri], if Q C Br, and p < Pi{D), then the map \&q = expoFg : 
Bp M is a contraction with constant li{po', p, D). 

3. Assume that Q C Bd (or more generally that ||^q(po)|| ^ D) and that either 

(i) D < Dcrit and pi{D) < p < p^{D), or 

(a) D < -Dmaxj is tcthcrcd to Q (Definition \4-3\) , and D < p < p/^{D). 

Then preserves preserves each hall Bp. In particular this holds for the D- 
independent radius pcrit- The sequence of iterates ^'qi^q) converges to the pri- 
mary center of mass q of {Q, p) for every q G -Bp3(D) if (i) holds, and for every 
q G Bp^i^D) if (a) holds. In either case q lies in -Bpo(D) H ohull(Q). 

4. For D < Dent the following relations hold: 

PoiD)<po{D), p,{D)<p,{D), p,{D)>p,{D), p,{D)>p^{D). (4.31) 

If the curvature hounds A, 5 are taken to he constants (e.g. A = A(i?ri),5 = 
5{Br^)), then the lower hound D^nt on -Dcrit is a universal function of the num- 
hers ri, A, and 5, depending in no other way on the geometry of {M,g). Sim- 
ilarly the lower hounds Z^max on -Dmax; Pi on pi for 3 < i < 4, and the upper 
hounds Pi on pi for < z < 1, are universal functions o/ri, A, 6, and D. 

Remark 4.9 The chief point of the last two sentences in Statement 0] is that -Dcrit? 
the critical upper bound for D in Theorem 14.81 and p^lD), the radius of the ball 
on which the convergence in Statement 01 is guaranteed, are impossible to compute 
without knowing the functions p 1— >• S{Bp),p 1— >• A{Bp) precisely. Thus Statement IH 
gives more easily used, if less sharp, lower bounds on these numbers. The analogous 
statement for pi will be used in §6 when we estimate the convergence rate of the 
sequence {"^Qipo)}- 

Remark 4.10 As D 0, the numbers ps{D) and Pa{D) increase. Thus, the smaller 
the diameter of the set Q, the larger the set on which the theorem shows that the 
iterates ^'q converge, and the larger the set on which the critical point of fq is 
guaranteed to be unique. Also note that lim/3^0 P3(-D) = lim£)_^o P4(-D) = sup{p G 
[0,ri] I fi;(p, 0) < 1} — a considerably larger number than Z^max = sup{p G [0,ri] | 
fi;(p, p) < 1}, which is the upper bound we would have found for the radii of the balls 
i?P3, Bp^ in statement 121 and had we not separated the roles of the variables p and D 
in defining p^ and p4 (i.e. if we had used "2p" in place of "p + Z^" in ()4.9|) and ()4.10|) ). 

Proofs of Corollary 14.71 and Theorem 14.81 Statements [H and El of the theorem 
just restate some of the conclusions of Lemma 14.51 for easy reference. Statement 
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121 follows from ()4.1H) . since s{p) > <^=^ /t(p) < 1. Statement El of Theorem 
14.81 and the existence portion of Corollary 14.71 follow from Theorem 12.81 applied to 
U = Bp^,B = Bp, since for pi < p < ps the fact that D < s{p) ensures that 
the condition ()2.19j) is met. The conclusion that q G Bpg fl ohull(Q) just combines 
Corollary 13 . 1 31 with Karcher's bound ()3.5|1 . 

Integrating ()4.8|) over Q implies that Hess(/Q) > on Bp provided that {p + D) ■ 
max(0, A(p))^/^ < n/2, a condition that Lemma 14.51 (statement ensures is met 
with p = p4. Hence Lemma f3. 101 implies that any critical point of fq in Bp^ is unique 
and minimizes /g on this ball (in fact, on regstar((5) by Corollarv I3.13|) . proving the 
remainder of Corollary 14.71 | 

Theorem 14 . 81 gives us an algorithm for computing the center of mass to any desired 
accuracy: start with some point q, and compute the iterates ^'^(g). As mentioned 
earlier, when Q is a finite set of points, it is natural to initialize the algorithm at 
some point of Q. This motivates the following corollary. In many cases of interest 
the ambient manifold is highly symmetric and the quantities rj-egcvx{q), -Dcrit(9) below 
are independent of q, enabling a much simpler statement of the corollary. 

Corollary 4.11 Let Q C M, p a probability measure on Q. For simplicity let con- 
stants A = A(M), S = S{M) be global upper and lower bounds on sectional curvature. 
Fo r q e Q let Dq{Q) = snp{d{q,qi) \ qi E Q}, let Dcvit{q) =^^crit(g, '^regcvx(g)) be as 
in d^. If for at least one point qo E Q we have Dg^^^Q) < -Dcrit(?o); then the center 
of mass q of {Q,p) exists, and equals lim^^oo ^q(q') for every q E Q. In particular 
this conclusion holds for any q^ G Q z/diam(Q) < DcritiQ) '■= inf{-Dcrit(Q'o) I % ^ Q}- 

Proof: The hypotheses imply that Q C Boiqo), where D = Dq^^{Q). Letting e = 
-Dcrit('?o) - Dgg{Q) and defining ps = P3(go, ?^regcvx(go) - e/2;-D) as in (|4.23|1. we have 
Q C -Bp3(go) since D < p^. Hence statement El of Theorem 14.81 implies the result. | 

Corollary 11.41 follows immediately. 

Centering the underlying convex regular superdisk at a point of Q as in Corollary 
14.111 while practical, is wasteful in terms of the restriction on the diameter of Q. Any 
set Q satisfying the hypotheses of Theorem 14.81 has a (convex regular) circumradius 
circumrad((5): the supremum of the radii of open, strongly convex, regular geodesic 
balls containing Q. For diam(Q) sufficiently small (in particular, if Q admits a convex 
regular superdisk centered at one of its points) circumrad((5) < diam((5), and the 
conclusion of Corollary 14.111 remains valid if diam(Q) is replaced by circumrad(Q) 
and if -Dcrit(Q) is replaced by DcritiPo), where po is the "circumcenter" . As a practical 
matter, the circumcenter is no easier to find than the center of mass, so that this 
strengthening of Corollary 14.111 is only useful if one has a uniform bound on rj-egcvx {p) 
(and therefore on -Dcrit(p)) for P in an appropriate neighborhood of Q. We will discuss 
this more quantitatively in ^ 
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5 Rapid convergence of the algorithms 



Given an iterable map F, let It(-F) denote the algorithm "iterate F". Under any 
contracting-mapping algorithm, the sequence of successive distances from one point 
to the next converges geometrically. However, it is well known that Newton's method 
does even better; each successive distance is bounded by a constant times the square 
of the preceding one. In this section we examine the convergence rates of algorithms 
of the form It(\&y) and It($x) in general (where "^y and $x are as in Theorem 
I2.8|l . and of the averaging algorithm It(\&yp) of Theorem 14.81 and CoroUarv 14.111 in 
particular. We will see that while the convergence rate of It(\E'y) for general Y is only 
geometric (although with a smaller ratio than fi:(\E'y)), the algorithms lt{^x) — more 
closely related to the flat-space Newton's method — have the same quadratic behavior 
as their flat-space cousins. The averaging algorithm falls somewhere in between: 
we obtain only geometric convergence, but with a very small ratio, provided that 
diam((5) is small enough. 

Throughout this section, notation will be as in Theorem 12.81 We denote the 
sequence of iterates {\l/y(po)} or {$^(po)} by {pn}- For any algorithm of the form 
It(\E'y), the following proposition shows that the rate at which d{pn,Pn+i) — ^ is 
completely controlled by bounds on VY + I. 

Proposition 5.1 Let U be a convex set preserved by \&y, let po G U, and for n > 
letpn = ^y(po)- Then 

d{Pn+l,Pn) < (sup ||(Vr + I)p\\) d{pn,Pn-l)- (5.1) 

Proof: From the definition of \l/y, we have 

d{pn+l,Pn) = \\Yn\\- (5.2) 

To analyze how \\Yn\\ changes when we increment n, fix n and let 7 : [0, 1] ^ M be 
the geodesic from pn to Pn+i with initial velocity Y^, thus Pn+i = 7(1), Yn = ^(0); 
and Yn+i = ^7(1)- Let V^(^t)^^{o) denote the operator of parallel transport along 7, 
with direction reversed, from 7(t) back to 7(0), let Ap = (VF + I)\p € End(TpM), 
and let ei = sup^g^ ll^pll- Then 

^(^7(tH7(0)(^7W) +^^7(0)) = ^7W-7(o)(Vy(t)r) +7'(0) 

= P^(i)^^(o)(Ayt)(7'(t))), 

since 7' is parallel along 7, and hence 

^7W-7(0)(>'7W) + 1)^7(0) = / ^7(ti)-7(0)(A(tl)(7'(i^l)))c?^l- (5-3) 

^0 
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The integrand is bounded in norm by ||^7{ti)|| ||7'(^i)ll = ll^7(ti)ll ll^nll- Hence 

\\y'yit)\\ = \\V,it)^,io){Y,it))\\ < {l-t+ [ \\A^^t,)\\ dh) \\Y4 (5.4) 

Jo 

< (l-t + eit)||r„||. (5.5) 
Inserting t = 1 we find < ei||F„||, and hence 

d{Pn+l,Pn) < eid{Pn,Pn-l)- (5.6) 



Thus in algorithms of the form It(\l/y), successive distances decrease geometrically, 
but with ratio ei — a number smaller than the contraction constant K(\E'y) in ()2.17|) . 
and one whose only dependence on curvature is through Y itself. 

To analyze the algorithms lt{^x), proceed as above but with Y = — (VX)^^X; 
continue writing A = VY + I. In this case, for p E U and v G TpM, from ()2.5|) we 
have Ap{v) = Bp{v)(Yp), where Bp{v) = —{(VX)~^o(\/^\/X))\p. Thus, pointwise we 
have 

Pll<^3||>'|| (5.7) 
where = k^^k2. Inserting this bound into ()5.4|) with t = 1, and using ()5.5|) in the 
new integrand, we obtain ||l^n+i|| < |^3(ei + l)||^nP where now ei = k^^e. Thus, 
with /c4 = ksi^ei + l)/2, we have 

d{pn+2,Pn+l) < kid{pn+l,Pnf, (5.8) 

the same quadratic falloff as in flat-space Newton's method. 

Note that the preceding analysis applies to any algorithm for which ()5.7|) holds, a 
condition intermediate between Case 1 and Case 2 of Theorem 12.81 

The convergence rates of Iti^y) and It($j>c) can also be compared as follows. 
With the constants as named above, assume that for thatei < 1, and for $x that 
k^ei < 1. Then for the algorithm It(\&y), we have 

%n+i,Pn)<c^(pi,Po)e?<er\ (5.9) 

whereas for It($x) we have 

d{pn+i,Pn) < k^^{k4d{pi,po)f" < /i;4 ^(A;4ei)^" (5.10) 

(if k^ = 0, interpret ()5.1Up as d{pn+i,Pn) = 0.) 

In the proof of the Contracting Mapping Theorem f Theorem 12.11) . to obtain con- 
vergence of the sequence {p„ = F^'^pq)}, it suffices to know that (i) d{pn,Pn+i) < 
nd{pn-i,Pn) for all n > 1, and (ii) d{po,pi) < (1 — /t)p. One does not need to know 
that F is a contraction on the whole ball B unless one wants to prove uniqueness of 
the fixed point and convergence of the sequence with other starting points. Thus the 
analysis above leads immediately to the following existence/convergence theorem to 
supplement Theorem 12.81 
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Theorem 5.2 Let B = Bpijpo) G M be a convex ball. Assume either of the sets of 
hypotheses listed as "Case 1" and "Case 2" in Theorem \2. with U replaced by the 
ball B. In Case 1, let F = \l/y; in Case 2 let F = $x- Assume in addition the 
following: 

Case 1. Assume ||F(po)|| < (1 ~ ^i)P- 

Case 2. Let = ki^k2{ki^e + l)/2 and if k^ ^ assume that 

oo 

j2K\hK'\\x{po)\\r < p. 

n=0 

Then in each case the sequence {F"(po)} I'i^s in B and converges to a fixed point of 
F that lies in B. 

The distances d{pn+i,Pn) in Case 1 have the exponential falloff given by \5. and 
in Case 2 have the super-exponential falloff given by i5.1(J\) . | 



Theorem 15.21 is most useful when one knows ahead of time that there is at most 
one fixed point. This is exactly the case for averaging algorithm HI'^Yq) used in 
m since we do not need the contracting-mapping apparatus to prove uniqueness — 
given existence, we already know from Kendall's theorem that if Q is contained in 
regular geodesic ball B then "^q := "^Yq has at most one fixed point in B. This leads 
immediately to the following strengthening of certain portions of Theorem 14.81 

Theorem 5.3 Let po G M, < ri < rregcvx(po)/ for < p < ri write Bp for Bp{po). 
Let A{-),6{-) be continuous monotone upper and lower bounds on curvature as in 
Definition \4.4]f b)- Define numbers D'^^.^^, D'^^^, p'^^.^^ and p\ analogously to the numbers 
defined in Lemma \J~5[ but with s replaced by the function 

§scq(A, 6; p, D) = {1- /iseq(A, 6; p, D))p (5.11) 

where 

^seq(A, ~5- p, D)) = ilj^,,{6{p), A{p),p + D). (5.12) 

Then Statements\^ and^ of Theorem \4-<^ hold with -Dcrit, -Dcrit, Pi, and pi replaced by 
p^, and p[ respectively. Assume thatQ C Bd (or more generally ||YQ(po)il < 
D ) and that either 

(t) D < D[,^„ or 

(a) D < -D^nax '^'^'^ tethered to Q (see Definition s.^ . 

Then the sequence of iterates {^E'q(po)} converges to the primary center of mass 
q of {Q,p), and q lies in q E -Bpo(-D) H ohull(Q). The entire sequence lies in -Bp;(i:)) 
(hence in the D -independent ball Bp>^^,J if (i) holds, and in Bp'^(^D) if (ii) holds. ^ 
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We have a corresponding strengthening of Corollary 14.111 

Corollary 5.4 Corollary \4 .11] remains true if the numbers Dcnt{q) are replaced by 
the larger numbers D'^^^^{q) defined in Theorem, \5.^ | 



For the map = ^^y-^ used in Theorem 15.31 and Corollary 15.41 we have a bound 
on the endomorphism A that, while not as strong as (j5.7|) . is better than for the 
general ^y- From ()4.9|) and ()4.6|) . if Q C Bd{pq) then on Bpij)^) we have 

\\A\\ < ^^UHBp{Po)),MBp{po)),P + D) = ^\K\{p + Df + 0{\K\\p + DY) (5.13) 

where \K\ = ma.x{6{Bp{pQ)), A{Bp{pQ)). Initialize the algorithm at a point pq E Q 
as in Corollary 14.111 let D = dia.m{Q), and assume that D < -Dcrit(Q) as in the 
corollary. From Theorem 15.21 "^n preserves the convex ball Bp-^^po), where pi(-D) is 
the smallest positive number p satisfying s{p, D) = D, and hence when applying 
the bound ()5.13|) in the analysis of {"^qipo)} it suffices to take p = pi{D). Since 
s{p, D) = p(l - 0{\K\{p + for D small we have pi{D) = D{1 + 0{\K\D^)). 

Thus 

\\A\\ < ^|ir|diam(g)2 + 0(|irpdiam(g)"), (5.14) 

which we can use for ei in ()5.fi|l and ()5.9|1 . Thus for any £2 > 0, if \K\ ■ diam((5)^ is 
small enough we have 

4 

d{pn+i,Pn) < {- + e2)\K\diam{Qfd{pn,Pn-i) (5.15) 
= di&m{Qfd{pn,Pn-i), (5.16) 



so in place of (j5.9p we can write 

d{pn+i,Pn) 
diam(Q)" 



<d(pi,Po)(fc5diam(g))". (5.17) 



In other words, as diam((5) 0, the falloff rate of successive distances in the 
averaging algorithm is geometric even relative to diam(Q). The bound ()5.10p shows 
that we would get even faster convergence to the center of mass if we iterated the 
map instead of "^Yq ■ However, as a practical tool ^Yq has the disadvantage that 
one must compute and invert VYq, which may be difficult even if M has constant 
curvature, whereas for many more general spaces the algorithm HI'^Yq) is easily 
programmable. 

Remark 5.5 Since A = VY+I, for diam(Q) small we can think of (j5.13p as asserting 
that the vector field Yq is, in some sense, very nearly linear. From this point of view 
it is no surprise that the convergence of the algorithm is so rapid — what we are using 
is almost Newton's method for an almost linear function. 
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As D 0, the bound ()5.15|) can be improved by using the circumradius of Q 
instead of its diameter in this estimate (see the discussion after Corollary 14.111) . In 

R", one always has circumrad(Q) < -^J 2{n+i) ^^^^iQ)y with a regular n-simplex an 
extremal configuration. In a general Riemannian manifold, if we restrict attention 
to sets Q contained in a subset U on which the there are bounds on the curvature 
and a positive lower bound on the injectivity radius, then as D — >■ the number 
sup{circumrad((5)/diam(Q) \ Q C U, < diam((5) < D} tends to its Euchdean 
value. Thus we obtain an asymptotic bound ei ~ |^^AD^, where n = dim(M). 



6 Averaging in the case of non-negative curvature 

When (M, g) has curvature of a fixed sign, the definitions of the critical radii in Theo- 
rem l5.3l and Corollarv 15 . 41 simplify, since we can globally replace 4'raa.x{Sp,p, ^p,p, P+D) 
in ()5.12|) by either either ip{A{p), p + D) or il:{S{p), p + D). In this section we assume 
that the curvature is non-negative, which is true in all shape spaces and size-and-shape 
spaces. 

The goal of this section is to estimate the critical radii appearing in Theorem 15.31 
as well as the convergence rate of the averaging algorithm (not merely the asymptotics 
of this rate as diam(Q) — >■ 0). To simplify the estimates further, we will assume a 
uniform upper bound A = A on sectional curvature in all the balls that appear in this 
section, and a uniform lower bound ri on the regular convexity radius of the center 
of any such ball. We assume A > strictly since the fiat case is not very interesting, 
the algorithm converging at the first iteration. 

Notation in this section will be for the most part as in §SH3 but it is convenient 
to define rescaled variables p = A^^^p, D = A^^'^D, and a rescaled function s = A^/^5 
of the rescaled variables (where in the definition of s we take 5 = 0, A = A). We 
also write k for k expressed in terms of the rescaled variables. We suppress all the 
parameters except D and p in most formulas below. 

Fix pq E M and let x = p + D. Then 

R{p, D) = k{x) := ^/'(l, x) = 1 — X cot x = -x^ + 0{x^) (6.1) 

and 

-s{-p,D) = {l-R{-p,D))-p. (6.2) 

Since A, 5 are constant, s is different iable, so the rescaled pair (pcrit, -Dcrit) from 
Lemma 14.51 can be characterized as the unique solution of the system of equations 

7s{-p,D) = D, (6.3) 
-{p,D) = (6.4) 
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in (0, 7r/2) x (0, vr/2), provided that pcrit as defined this way is less than A^/^ri. For this 
system of equations, Maple's f solve routine"^ yields p[,j,jt ~ .6816 > .21697r, Z)^j,j^ ^ 
.3952 > .12587r. Thus 

p^t > min(ri, .21697rA-^/2)^ ^^^.^ > min(ri, .12587rA-^/2). (6.5) 
From these numbers we also compute p'^{D'^^^^) ^ min(ri, 

1.1566A-1/2) ~ .36827rA-i/2_ 

Centering all balls below at po and writing Bp for Sp(po), we recall what the num- 
bers just computed tell us: from Theorem 15. 3[ for any {Q,fi) with Q in the ball of 
radius -Dcrit; any p in the ball of radius p^j.it) ^^e sequence converges to 

the primary center of mass of (Q,p). If "^q is tethered to Q, to conclude conver- 
gence we need only assume that Q and p lies in the balls of radius D'^nt PK-^crit) 
respectively. 

HQ C Bo then as D 0, the algorithm converges on larger and larger sets, 
the balls of radius p'^{D) (or p'^{D) in the tethered case). These radii approach 
P3(0) = P4(0) = min(ri, (7r/2)A^/^). Thus as Z) — >• we get convergence on balls of 
radius arbitrarily close to (but smaller than) the largest radius for which Kendall's 
theorem f Theorem 13. 7|) guarantees uniqueness of the center of mass. 

Remark 6.1 Corollarv l4.7l the existence/uniqueness theorem given by the contracting- 
mapping approach, guarantees existence of the center of mass of a distribution sup- 
ported in a ball of radius -D^j.;,.; in Karcher's result, the .12587r in ()6.5p is replaced 
by the better 7r/4. To compare the uniqueness statement in Corollarv 14. 71 with those 
of Karcher and Kendall, we cannot use the radii above, coming from Theorem 15.31 
but must go back to those in Theorem 14.81 This has the effect of replacing il){l,x) 
in (|6.ip by 0-(a;) -|- ^'(l,^;) = -|- 0{x'^). In this case we analogously compute 
-Dcrit ~ min(T'i, .09047rA-i/2^ and P4(£'crit) ~ min(ri, .27777rA-i/2)5^ Corollary O 
implies that if Q is contained in the ball of radius -Dcrit, then {Q,fi) has a unique 
center of mass in the ball of radius p4(I^crit)- Thus in the non- negative curvature 
case, for D < -Dcrit the contracting-mapping approach, while giving not as strong 
a uniqueness statement as in Kendall's theorem, gives a slightly stronger statement 
than in Karcher's original theorem, which has only 7r/4 in place of our worst-case 
constant .27 777r. 

We next estimate the convergence rate of {pn = \I/q(po)}; assuming that Q lies in 
the ball of radius -Dcrit- From Theorem 15.31 the sequence stays in the ball of radius 
p^j.it(-D)) oil which, letting A = VYq + I and writing Xcrit = Pcrit(-^) + -^crit) ^^e bound 
()5.13|) gives 

\\A\\ < jj^^^ jO, 1, Xcrit) = /i(xcrit) < .4202 (6.6) 

^AU numerical calculations in this section were done with Maple. 

^Thesc numbers increase slightly if (M,g) is further assumed to be locally symmetric, since 
instead of 4>-{x) we can then use the smaller quantity (j)+{x) — 4>-{x) — ■j^x'^ + 0{x^). In this case 
we can replace .09047r by .09327r, and .2 7 777r by .299l7r. The improvement is so marginal because 
4>+{x) and (j)-(x) differ by only j^x* + 0{x^). 
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Hence we obtain the geometric convergence rate ()5.6p with ei = .4202. 

If we start with po & Q and assume D = diam((5) < D'ait in Corollary 15. 4| 
then as D decreases we can sharpen the convergence-rate estimate by replacing a;crit 
with p'i(-D) + -D in the previous estimate. Since k is monotone increasing on [0, Xcrit], 
and s{pi{D),D) = D, we have pi{D) < yz^§-;) := CiD < 1.725^. The function 
X ^— s> k{x)/x'^ is monotone increasing on [0, vr), so for x G [0, Xcrit] we have < k{x) < 
{k{xcrit)/xl^-^^)x'^ := C2X^. Thus \\A\\ < C2(l + ci^AD^ < 2.Q90AD^, so we can take 
ei = 2.690 AL)2 in ^ and 

As -D — ^ 0, this can be improved further — ()5.15|) gives a bound on ei asymptotic 
to |AD^, and as noted at the end of §Slthis can even be reduced to |^^AD^, where 
n = dim(M). 

Finally, we consider two simple examples: round spheres and complex projective 
spaces, with standard metrics. If M is a round sphere of radius R, then the curvature 
is constant and equal to R~^, and rcvx{M) = rreg(M) = tiR/2. Hence we can take 
A~^/^ = R and erase "min","ri" and the tildes in all the estimates above; e.g. in 
place of ()fi.5|l we have simply 

pVt > .21697ri?, > .12587r/? (6.7) 

Similarly, CP" with a Fubini-Study metric (unique up to scale) is a symmetric space 
of positive curvature. If we fix the scale by taking the metric to be the one for 
which the standard projection from the unit sphere 5'^""'"^ — >■ CP" is a Riemannian 
submersion, then the sectional curvatures of CP" run between 5 = 1 and A = 4 
if n > 2 (the curvature is identically 4 if n = 1; CP^ with this metric is a round 
sphere of radius 1/2). In this case we have rcvx(^) = 7r/4 and Ar^^"^ = 1/2, so the 
critical radii are exactly half those for the unit sphere; bounds are given by ()6.7p with 
R = 1/2. It is not hard to show that E2, the shape space of k points in R^, is exactly 
CP^""^ with this metric (if k > 2) jT^j, so the numbers above directly relate to the 
behavior of the Riemannian averaging algorithm on this shape space. 

7 Appendix 

7.1 Proof and discussion of Proposition l2T3l 

In this subsection, hypotheses and notation are as in Proposition 12.31 We first prove 
()2.10|) and then discuss how to sharpen this bound for locally symmetric spaces; the 
bound ()2.12p follows as a special case of this discussion. 

Proof of n2.10j) . jll and J"*", the components of X parallel and perpendicular 
to 7', are themselves Jacobi fields, with J"(t) = {at + c)y{t) for some a, c G R. 
Each of j" and J-*" satisfies antidiagonal initial conditions. In particular, c = —a, so 
Jll(l) = 0. Hence X(l) = J^(l), so it suffices to prove ()2.10|) under the assumption 
that V _L 7'(0), which we make henceforth. 
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Let {cjIq"^, where n = dim(M), be an orthonormal basis of TpM with eo = 
7'(0)/||7'(0)||, and extend each e, along 7 by parallel translation. Write J(t) = 
Yl^=i f\'t)^i{'t) s-iid let / : [0, 1] — > R"~^ be the vector- valued function whose com- 
ponents are the note that Euclidean = ||-'^(^)||- Then (12. 2j) simply becomes 

fit) = A{t)f{t) (7.1) 

for a certain (n — 1) x [n — 1) matrix- valued function A whose operator norm satisfies 
11^(^)11 < l-^l(7(^))ll7'(^)P- The norm of 7'(t) is constant and equal to the length r 
of 7. Letting b = \K\{'~f), we therefore have ||v4(t)|| < 6r^. 

For V G TpM write v = ^f*ej, and let v G R" be the vector whose components 
in the standard basis are the v\ The initial conditions for then become /(O) = 
— /'(O) = V. The unique solution of (jT.lj) with these initial conditions is given explicitly 
by the series 



f(t) = {l-t)v+ / {l-ti)A{ti)v dtidt2 
Jo Jo 



rt pti pts pt2 

^ -h)A{t3)A{ti)v dtidt2dt3dti 

Jo Jo Jo Jo 

+ ...+ ... (1- tl)A(t2m-l)A(t2m-3) ■ ■ ■ A{t,)v dti... dh^m 

J J0<ti<t2...<t2,n<t 

+ ... (7.2) 

(This series converges in norm uniformly on any compact t-interval.) In the 2m-fold 
integral the integrand is bounded in norm by (1 — ti)6™r^''"||f || provided < t < 1, 
the only case we are interested in. Integrating explicitly, we obtain ||f Hfo'^r^ 

^2m+l 



^(2m)! 

^^^^-^-^1 ; as an upper bound on the 2m-fold integral. Hence for < t < 1 we have 



°° j.2m f2m+l 

= (cosh(6VV()-5l!^!«)ll„l|, 

Plugging in t = 1, the bound ()2.1()|1 follows. 



In contrast to more frequently-seen bounds on Jacobi fields, the sign of the sec- 
tional curvature does not play a role in ()2.1()j) . The reason is the anti-diagonal initial 
condition, which in Euclidean space leads to J(l) = 0. If M is positively curved, then 
II J|| can reach before time 1 and then grow again, so that || J(l)|| cannot be bounded 
by its Euchdean analog. However, while it is not obvious how to get the best bound 
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in Proposition 12 . 31 for general manifolds, or even for nonnegatively curved manifolds, 
the analysis simplifies considerably for locally symmmetric spaces (manifolds whose 
Riemann tensor is covariantly constant; examples are S"^ and CP"). In this case 
the matrix A{t) in (jT.lj) is a constant symmetric matrix r^A, and the solution ()7.2j] 
collapses to 

f(t) = (c(tV2i) - t s{t^r^A))v (7.3) 
(see Table 1 in §2.) Hence in this case ()2.10j) can be improved to 

||J,(1)|| < \\c{r'A)-s{r'A)\\ \\v^\\. (7.4) 

We can always choose an orthonormal basis in which the matrix A in the proof above 
is diagonal, say A = diag(Ai, . . . , A„_i). Then c(r^A) — s(r^A) becomes a diagonal 
matrix with entries sign(Ai) ■ 0sign(Ai)(|Ai|^/^r). The sectional curvatures of M range 
between 6 < min{Aj} and A > max{Aj} (we would have equality here if we replaced 
6 and A by the minimum and maximum sectional curvatures achieved on 2-planes 
tangent to 7) and are increasing functions on appropriate intervals: on [0, 00) 
(the Taylor coefficients are all nonnegative), 0+ on [0,Xo], where Xq ~ 0.877r is the 
first positive solution of (x^ — 1) sinx + xcosx = 0. Hence 

( 0+(AV2r) if < (5 < A and A^/V < xq, 

\\c{r'^A)-s{r^A)\\ < < max(0_(|(5|i/V),0+(Ai/2r)) if 5 < < A and A^V < xq, 

[ (f)4\6\^/^r) if 5 < A < 0. 

(7.5) 

Thus for a locally symmetric space we can replace 0_(r 17^1(7)^/^) in ()2.1()j) by the 
appropriate line of ()7.5p : the top line yields ()2.12|) . since Xq > 37r/4. (We chose 37r/4 
in Proposition 12.31 for simplicity. Values of 0+ that equal or exceed 1 are irrelevant 
to us since in Theorem I2.8r b) they lead to a useless bound on n. The first positive 
X for which 0+(x) = 1 is approximately .7477, so the restriction A^/^r < 37r/4 more 
than suffices for our considerations.) 

If M has constant curvature — i.e. all sectional curvatures are equal, say to A — 
then the matrix in ()7.3|) is a multiple of the identity, leading us to sharp equality. In 
this case A = —AI so we obtain 

||J.(l)||=0±(|A|VV)||r;^|| (7.6) 

where 0+ is used if A > 0, and 0_ if A < 0. 

7.2 The Hessian of the squared distance function 

Good references for the material in this subsection are J3], §5 and Appendix C. 

The lemma below was used in Lemma 14.11 and Corollary 14.71 The useful bound 
()7.9|) is essentially proven in [2] Chapters 4-5, but is not explicitly stated in this form. 
(Theorem 5.2 of ^3] asserts an inequality that looks identical to (j7.9|) . but because 
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Hildebrandt's goal in [T?] is a simple upper bound that applies to all vectors, not 
just those orthogonal to 7', he imposes the requirement S < 0.) The block-diagonal 
decomposition of the Hessian indicated in the lemma must generally be used in order 
to get the sharpest estimates on ||VF + I\\ when Y is the gradient of a function of 
the form p ^ Jq f{d{p, q)) dfi{q). 

Lemma 7.1 Let p,q E M with d{p,q) < Tinj(Q') and let H = Hess(|r^)|p. Let 7 : 
[0, 1] ^ M be the minimal geodesic from q to p, let u a unit vector tangent to 7 at p, 
and let C TpM he the orthogonal complement o/span(M). Let 5 and A he lower 
and upper hounds, respectively, for the sectional curvatures of M along 7; z/ A > 
also assume d{p, q) < ttA^^/^. Then for all v G we have the following: 



Proof: Recall that for any function /, vectors X, F G TpM, and an arbitrary smooth 
extensions of X, Y to vector fields on a neighborhood of the covariant Hessian Hf 
is given by 



Gauss lemma implies Z\rq) = along 7, fj7.8|) is trivial as well. The bound ()4.8p can 
derived from the normal- Jacobi-field estimate ^Ij Theorem 4.2, followed by rescaling 
the arclength parameter as at the bottom of ^3] p. 53, and then restricting the proof 
of fT^ Theorem 5.2 to the case of vectors orthogonal to the geodesic. | 
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